[Bug target/31850] gcc.c-torture/compile/limits-fnargs.c is slow at compiling for spu-elf

2008-11-27 Thread tehila at il dot ibm dot com


--- Comment #13 from tehila at il dot ibm dot com  2008-11-27 12:20 ---
(In reply to comment #12)

Thanks, Andrey.
I think there are 2 issues here:
1. register-renaming. (more related to this PR, I think)
2. schuedule-insns.
Both of them slows compilation.
With ARG4, on SPU, I see:
-O1: 9m28.355s
-O1 -fno-rename-registers:0m19.196s

-O2: 184m37.492s (not 1000 as I wrote, but 100)
-O2 -fno-rename-registers: 31m29.482s
-O2 -fno-schedule-insns:  10m26.851s
-O2 -fno-rename-registers -fno-schedule-insns: 0m39.425s

Should I open a different PR for scheduling?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31850



[Bug target/31850] gcc.c-torture/compile/limits-fnargs.c is slow at compiling for spu-elf

2008-11-27 Thread tehila at il dot ibm dot com


--- Comment #15 from tehila at il dot ibm dot com  2008-11-27 12:57 ---
(In reply to comment #14)
 (In reply to comment #13)
  (In reply to comment #12)
  Thanks, Andrey.
  I think there are 2 issues here:
  1. register-renaming. (more related to this PR, I think)
  2. schuedule-insns.
  Both of them slows compilation.
  With ARG4, on SPU, I see:
  -O1: 9m28.355s
  -O1 -fno-rename-registers:0m19.196s
  -O2: 184m37.492s (not 1000 as I wrote, but 100)
  -O2 -fno-rename-registers: 31m29.482s
  -O2 -fno-schedule-insns:  10m26.851s
  -O2 -fno-rename-registers -fno-schedule-insns: 0m39.425s
 Do you see this on ppc to spu cross?  How was your compiler configured? 

Yes (ppc(ppu) to spu cross).
Configuration:
--target=spu --disable-shared --disable-threads --disable-checking
--with-headers --with-newlib --with-system-zlib --enable-languages=c
--disable-nls --enable-version-specific-runtime-libs --disable-libssp
--program-prefix=spu   


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31850



[Bug target/31850] gcc.c-torture/compile/limits-fnargs.c is slow at compiling for spu-elf

2008-11-25 Thread tehila at il dot ibm dot com


--- Comment #11 from tehila at il dot ibm dot com  2008-11-25 12:17 ---
(In reply to comment #10)
 If you only get slow compilation at -O2 and above then your problem is 
 probably
 due to PR 37790.  The original problem affected -O1 compiles as well as -O2.

PR 37790 doesn't solve the problem I see.
On SPU, with -O1 and -O2 -fno-schedule-insns the compilation time is long
(timed out == more than 5 minutes), but it's not as long as with -O2:
-O1 - 9.5 minutes.
-O2 -fno-schedule-insns - 10.5 minutes
-O2 -  1000m

I don't know what can be done in order to improve compilation time on SPU, but
for sure - there is a problem in the insns shceduler.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31850



[Bug target/31850] gcc.c-torture/compile/limits-fnargs.c is slow at compiling for spu-elf

2008-11-17 Thread tehila at il dot ibm dot com


--- Comment #9 from tehila at il dot ibm dot com  2008-11-18 07:35 ---
This testcase is indeed very slow on SPU, with -O2 and above.
I don't see any slowness for -O1.
If I turn off the insns scheduler (with -fno-schedule-insns) it is much faster:
X4 faster for 1,000 args (ARG3), much more for 10,000 args (ARG4).
It seems that the scheduler generates excessive register pressure, by hoisting
loads and sinking stores.
Maybe the decision-maker of the scheduler (which insn to move) should be
improved.


-- 

tehila at il dot ibm dot com changed:

   What|Removed |Added

 CC||uweigand at de dot ibm dot
   ||com, bergner at vnet dot ibm
   ||dot com, abel at ispras dot
   ||ru, tehila at il dot ibm dot
   ||com, zaks at il dot ibm dot
   ||com, meissner at gcc dot gnu
   ||dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31850



[Bug middle-end/37221] Missed early loop-unroll optimization - causes 40% degradation on SPU

2008-09-08 Thread tehila at il dot ibm dot com


--- Comment #12 from tehila at il dot ibm dot com  2008-09-08 08:21 ---
(In reply to comment #11)
 (In reply to comment #10)
  I'm bootstraping and testing it on x86 now.
 Bootstrap fails (at least on x86_64) (with ICE).
 Tehila.

It fails at tree-ssa-loop-manip.c:424 (+-, I've changed it a little bit), on:
gcc_assert (!def_bb
  || flow_bb_inside_loop_p (def_bb-loop_father, bb));

Error:
In file included from ../../gcc/libiberty/regex.c:638:
../../gcc/libiberty/regex.c: In function ⁁byte_regex_compile⁁:
../../gcc/libiberty/regex.c:2285: internal compiler error: in
check_loop_closed_ssa_use, at tree-ssa-loop-manip.c:424


Here is some info GDB gives:
#0  check_loop_closed_ssa_use (bb=0x2b9638623f00, use=0x2b963870c460) at
../../gcc/gcc/tree-ssa-loop-manip.c:422
#1  0x009cffdc in check_loop_closed_ssa_stmt (bb=0x2b9638623f00,
stmt=0x2b963822b150) at ../../gcc/gcc/tree-ssa-loop-manip.c:436
#2  0x009d0179 in verify_loop_closed_ssa () at
../../gcc/gcc/tree-ssa-loop-manip.c:466
#3  0x007a14bf in execute_function_todo (data=0x63) at
../../gcc/gcc/passes.c:1004
#4  0x007a0f2e in do_per_function (callback=0x7a11cb
execute_function_todo, data=0x63) at ../../gcc/gcc/passes.c:840
#5  0x007a1541 in execute_todo (flags=99) at
../../gcc/gcc/passes.c:1024
#6  0x007a1fc5 in execute_one_pass (pass=0x13494c0) at
../../gcc/gcc/passes.c:1300
#7  0x007a214d in execute_pass_list (pass=0x13494c0) at
../../gcc/gcc/passes.c:1326
#8  0x007a216b in execute_pass_list (pass=0x13490a0) at
../../gcc/gcc/passes.c:1327
#9  0x007a216b in execute_pass_list (pass=0x1348560) at
../../gcc/gcc/passes.c:1327
#10 0x009140c6 in tree_rest_of_compilation (fndecl=0x2b9638292900) at
../../gcc/gcc/tree-optimize.c:418
#11 0x00b49808 in cgraph_expand_function (node=0x2b96382b3f00) at
../../gcc/gcc/cgraphunit.c:1039
#12 0x00b499a2 in cgraph_expand_all_functions () at
../../gcc/gcc/cgraphunit.c:1101
#13 0x00b49f43 in cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1306
#14 0x0042f789 in c_write_global_declarations () at
../../gcc/gcc/c-decl.c:8080
#15 0x0089c5a9 in compile_file () at ../../gcc/gcc/toplev.c:979
#16 0x0089e3f6 in do_compile () at ../../gcc/gcc/toplev.c:2181
#17 0x0089e45a in toplev_main (argc=30, argv=0x7fff730801f8) at
../../gcc/gcc/toplev.c:2213
#18 0x004d3bb7 in main (argc=30, argv=0x7fff730801f8) at
../../gcc/gcc/main.c:35

stmt is:
D.9310_4596 = b_3442 + -1;

def stmt is:
b_3442 = PHI b_2310(130), b_620(141)


HTH,
Tehila.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37221



[Bug middle-end/37221] Missed early loop-unroll optimization - causes 40% degradation on SPU

2008-09-04 Thread tehila at il dot ibm dot com


--- Comment #11 from tehila at il dot ibm dot com  2008-09-04 19:46 ---
(In reply to comment #10)
 I'm bootstraping and testing it on x86 now.
Bootstrap fails (at least on x86_64) (with ICE).

Tehila.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37221



[Bug middle-end/37221] Missed early loop-unroll optimization - causes 40% degradation on SPU

2008-09-03 Thread tehila at il dot ibm dot com


--- Comment #10 from tehila at il dot ibm dot com  2008-09-03 06:58 ---
(In reply to comment #9)
 If you give the patch bootstrap  testing I'll approve it for trunk.
 Richard.

Great.
I'm bootstraping and testing it on x86 now.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37221



[Bug middle-end/37221] Missed early loop-unroll optimization - causes 40% degradation on SPU

2008-09-02 Thread tehila at il dot ibm dot com


--- Comment #8 from tehila at il dot ibm dot com  2008-09-02 12:47 ---
Thank you, Richard!

This patch indeed does the work and unrolls the loop.
The SRA works fine and we get 40% improvement.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37221



[Bug middle-end/37221] GCC for Cell SPU produces poor code when there is load-after-store in different loops

2008-08-26 Thread tehila at il dot ibm dot com


--- Comment #5 from tehila at il dot ibm dot com  2008-08-26 20:47 ---
(In reply to comment #3)
 The meaning here is to the second 
 for (j = 0; j  4; j++)
 loop. 
 It's loop #4 in cunrolli pass.
  cunrolli doesn't recognize # of iterations = 4.
  I think it doesn't recognize it starts from 0.

We think the problem is that j=0 are somewhere before got hoisted into some
part above.
If I add 'printf' before the loop (i.e., after the if) the loop does get
unrolled and with SRA optimization the performance get improved.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37221



[Bug middle-end/37221] GCC for Cell SPU produces poor code when there is load-after-store in different loops

2008-08-25 Thread tehila at il dot ibm dot com


--- Comment #2 from tehila at il dot ibm dot com  2008-08-25 08:18 ---
Andrew, thanks for your response and ideas.

From what we see, if -funroll-loops is on, the loops:
for (j = 0; j  4; j++)
arr[j] = mat2[i][j];
 and 

for (k = 0; k  3; k++)
  point += (double) mat1[arr[l]][k];

are being unrolled by the early-unrolling (cunrolli pass, that Richard Guenther
has added).
I think, the problem is that the loop

for (j = 0; j  4; j++)

is not being unrolled.
cunrolli doesn't recognize # of iterations = 4.
I think it doesn't recognize it starts from 0.
Maybe Richard could help us understand why. 

Hopefully, if that loop would be unrolled, the SRA will have the opportunity to
do the transformation we expect it to do.


-- 

tehila at il dot ibm dot com changed:

   What|Removed |Added

 CC||richard dot guenther at
   ||gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37221



[Bug middle-end/37221] GCC for Cell SPU produces poor code when there is load-after-store in different loops

2008-08-25 Thread tehila at il dot ibm dot com


--- Comment #3 from tehila at il dot ibm dot com  2008-08-25 08:45 ---
(In reply to comment #2)
 Andrew, thanks for your response and ideas.
 From what we see, if -funroll-loops is on, the loops:
 for (j = 0; j  4; j++)
 arr[j] = mat2[i][j];
  and 
 for (k = 0; k  3; k++)
   point += (double) mat1[arr[l]][k];
 are being unrolled by the early-unrolling (cunrolli pass, that Richard 
 Guenther
 has added).
 I think, the problem is that the loop
 for (j = 0; j  4; j++)
 is not being unrolled.

The meaning here is to the second 
for (j = 0; j  4; j++)
loop. 
It's loop #4 in cunrolli pass.

 cunrolli doesn't recognize # of iterations = 4.
 I think it doesn't recognize it starts from 0.
 Maybe Richard could help us understand why. 
 Hopefully, if that loop would be unrolled, the SRA will have the opportunity 
 to
 do the transformation we expect it to do.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37221



[Bug middle-end/37221] GCC for Cell SPU produces poor code when there is load-after-store in different loops

2008-08-25 Thread tehila at il dot ibm dot com


--- Comment #4 from tehila at il dot ibm dot com  2008-08-25 14:52 ---
(In reply to comment #2)

 Hopefully, if that loop would be unrolled, the SRA will have the opportunity 
 to do the transformation we expect it to do.

I've tried it manually, and that indeed works.
i.e., if we'll be able to unroll the loop we currently don't, the SRA will do
the work.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37221



[Bug c/37221] New: GCC for Cell SPU produces poor code when there is load-after-store in different loops

2008-08-24 Thread tehila at il dot ibm dot com
I have the following testcase:
#include math.h

#define N 256
#define M 256

double mat1[N][3];
int mat2[M][4];

double point = 0;
int tmp;

double
foo ()
{
  int i, j, k, l, ntimes;
  int arr[4];

  for (ntimes = 0; ntimes  5000; ntimes++)
{
  for (i = 0; i  M; i++)
{
  for (j = 0; j  4; j++)
arr[j] = mat2[i][j];
  if (arr[0] == tmp || arr[1] == tmp ||
  arr[2] == tmp || arr[3] == tmp)
{
  for (j = 0; j  4; j++)
for (k = 0; k  3; k++)
  point += (double) mat1[arr[j]][k];
}
}
}

}


void
init ()
{
  int i, j;
  for (i = 0; i  N; i++)
{
  mat1[i][0] = (double) i;
  mat1[i][1] = (double) i + 1;
  mat1[i][2] = (double) i + 2;
}

  for (i = 0; i  M; i++)
{
  mat2[j][0] = 0;
  mat2[j][1] = 0;
  mat2[j][2] = 0;
  mat2[j][3] = 0;
}
  tmp = 33;
}

int
main ()
{
  init ();
  foo ();
}

Is there an option that GCC will recognize the load-after-store of 
arr[0], arr[1], arr[2] and arr[3] (after unrolling) and will replace them all
with registers? Is there a flag doing that?

Doing such transformation will improve the testcase by 40%.

I've tried that on GCC4.4.0, r139150, with -O3 (-funroll-loops -fgcse-las makes
it worse).


-- 
   Summary: GCC for Cell SPU produces poor code when there is load-
after-store in different loops
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: tehila at il dot ibm dot com
GCC target triplet: Cell SPU


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37221



[Bug tree-optimization/32826] Reduction into a global variable causes a Load Hit Store Hazard (for the Cell)

2007-07-26 Thread tehila at il dot ibm dot com


--- Comment #2 from tehila at il dot ibm dot com  2007-07-26 10:46 ---
(In reply to comment #2)
Just want a clarification:
I see you're compiling on PPU (since you're using -maltivec).
Does this problematic also on SPU? Does SPU has this LHS hazard?

Another question:
lwz r0,-20(r1)    LHS hazard
stw r0,lo16(_e)(r2)

The problem here is these 2 insns, right?
The store that is right after (or too close to) the load ?


-- 

tehila at il dot ibm dot com changed:

   What|Removed |Added

 CC||tehila at il dot ibm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32826



[Bug tree-optimization/32821] New: tree-if-conv:combine_blocks with -ftree-dump-tree-all-details fails on ICE in compilation: segfault

2007-07-19 Thread tehila at il dot ibm dot com
#0  first_stmt (bb=0xb7fa75a0) at ../../gcc/gcc/tree-iterator.h:43
#1  0x0838d46e in dump_generic_bb (file=0x9785710, bb=0xb7fa75a0, indent=0,
flags=16448) at ../../gcc/gcc/tree-pretty-print.c:2909
#2  0x0831b8a7 in tree_dump_bb (bb=0xb7fa75a0, outf=0x9785710, indent=0) at
../../gcc/gcc/tree-cfg.c:2206
#3  0x08127144 in dump_bb (bb=0xb7fa75a0, outf=0x9785710, indent=0) at
../../gcc/gcc/cfghooks.c:294
#4  0x08324f4e in remove_bb (bb=0xb7fa75a0) at ../../gcc/gcc/tree-cfg.c:1964
#5  0x0812661d in delete_basic_block (bb=0xb7fa75a0) at
../../gcc/gcc/cfghooks.c:472
#6  0x0835ad2a in combine_blocks (loop=0xb7d73678) at
../../gcc/gcc/tree-if-conv.c:991
#7  0x0835bb2d in tree_if_conversion (loop=0xb7d73678, for_vectorizer=value
optimized out) at ../../gcc/gcc/tree-if-conv.c:201
#8  0x0835c813 in main_tree_if_conversion () at
../../gcc/gcc/tree-if-conv.c:1137
#9  0x0829768f in execute_one_pass (pass=0x8823060) at
../../gcc/gcc/passes.c:1125
#10 0x0829788f in execute_pass_list (pass=0x8823060) at
../../gcc/gcc/passes.c:1178
#11 0x082978a2 in execute_pass_list (pass=0x88239a0) at
../../gcc/gcc/passes.c:1179
#12 0x082978a2 in execute_pass_list (pass=0x88231a0) at
../../gcc/gcc/passes.c:1179
#13 0x08375fc2 in tree_rest_of_compilation (fndecl=0xb7d66f00) at
../../gcc/gcc/tree-optimize.c:406
#14 0x084e8da0 in cgraph_expand_function (node=0xb7d66f80) at
../../gcc/gcc/cgraphunit.c:1073
#15 0x084eb500 in cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1142
#16 0x0805ddd6 in c_write_global_declarations () at ../../gcc/gcc/c-decl.c:7898
#17 0x0831996f in toplev_main (argc=17, argv=0xbfced724) at
../../gcc/gcc/toplev.c:1057
#18 0x080da95f in main (argc=-1210635284, argv=0xb7d72c24) at
../../gcc/gcc/main.c:35


-- 
   Summary: tree-if-conv:combine_blocks with -ftree-dump-tree-all-
details fails on ICE in compilation: segfault
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: tehila at il dot ibm dot com
 GCC build triplet: i386-redhat-linux (also powerpc-*-linux)
  GCC host triplet: i386-redhat-linux (also powerpc-*-linux)
GCC target triplet: i386-redhat-linux (also powerpc-*-linux)


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32821



[Bug tree-optimization/32821] tree-if-conv:combine_blocks with -ftree-dump-tree-all-details fails on ICE in compilation: segfault

2007-07-19 Thread tehila at il dot ibm dot com


--- Comment #1 from tehila at il dot ibm dot com  2007-07-19 13:38 ---
(In reply to comment #0)
 #0  first_stmt (bb=0xb7fa75a0) at ../../gcc/gcc/tree-iterator.h:43
 #1  0x0838d46e in dump_generic_bb (file=0x9785710, bb=0xb7fa75a0, indent=0,
 flags=16448) at ../../gcc/gcc/tree-pretty-print.c:2909
 #2  0x0831b8a7 in tree_dump_bb (bb=0xb7fa75a0, outf=0x9785710, indent=0) at
 ../../gcc/gcc/tree-cfg.c:2206
 #3  0x08127144 in dump_bb (bb=0xb7fa75a0, outf=0x9785710, indent=0) at
 ../../gcc/gcc/cfghooks.c:294
 #4  0x08324f4e in remove_bb (bb=0xb7fa75a0) at ../../gcc/gcc/tree-cfg.c:1964
 #5  0x0812661d in delete_basic_block (bb=0xb7fa75a0) at
 ../../gcc/gcc/cfghooks.c:472
 #6  0x0835ad2a in combine_blocks (loop=0xb7d73678) at
 ../../gcc/gcc/tree-if-conv.c:991
 #7  0x0835bb2d in tree_if_conversion (loop=0xb7d73678, for_vectorizer=value
 optimized out) at ../../gcc/gcc/tree-if-conv.c:201
 #8  0x0835c813 in main_tree_if_conversion () at
 ../../gcc/gcc/tree-if-conv.c:1137
 #9  0x0829768f in execute_one_pass (pass=0x8823060) at
 ../../gcc/gcc/passes.c:1125
 #10 0x0829788f in execute_pass_list (pass=0x8823060) at
 ../../gcc/gcc/passes.c:1178
 #11 0x082978a2 in execute_pass_list (pass=0x88239a0) at
 ../../gcc/gcc/passes.c:1179
 #12 0x082978a2 in execute_pass_list (pass=0x88231a0) at
 ../../gcc/gcc/passes.c:1179
 #13 0x08375fc2 in tree_rest_of_compilation (fndecl=0xb7d66f00) at
 ../../gcc/gcc/tree-optimize.c:406
 #14 0x084e8da0 in cgraph_expand_function (node=0xb7d66f80) at
 ../../gcc/gcc/cgraphunit.c:1073
 #15 0x084eb500 in cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1142
 #16 0x0805ddd6 in c_write_global_declarations () at 
 ../../gcc/gcc/c-decl.c:7898
 #17 0x0831996f in toplev_main (argc=17, argv=0xbfced724) at
 ../../gcc/gcc/toplev.c:1057
 #18 0x080da95f in main (argc=-1210635284, argv=0xb7d72c24) at
 ../../gcc/gcc/main.c:35

(In reply to comment #0)
 #0  first_stmt (bb=0xb7fa75a0) at ../../gcc/gcc/tree-iterator.h:43
 #1  0x0838d46e in dump_generic_bb (file=0x9785710, bb=0xb7fa75a0, indent=0,
 flags=16448) at ../../gcc/gcc/tree-pretty-print.c:2909
 #2  0x0831b8a7 in tree_dump_bb (bb=0xb7fa75a0, outf=0x9785710, indent=0) at
 ../../gcc/gcc/tree-cfg.c:2206
 #3  0x08127144 in dump_bb (bb=0xb7fa75a0, outf=0x9785710, indent=0) at
 ../../gcc/gcc/cfghooks.c:294
 #4  0x08324f4e in remove_bb (bb=0xb7fa75a0) at ../../gcc/gcc/tree-cfg.c:1964
 #5  0x0812661d in delete_basic_block (bb=0xb7fa75a0) at
 ../../gcc/gcc/cfghooks.c:472
 #6  0x0835ad2a in combine_blocks (loop=0xb7d73678) at
 ../../gcc/gcc/tree-if-conv.c:991
 #7  0x0835bb2d in tree_if_conversion (loop=0xb7d73678, for_vectorizer=value
 optimized out) at ../../gcc/gcc/tree-if-conv.c:201
 #8  0x0835c813 in main_tree_if_conversion () at
 ../../gcc/gcc/tree-if-conv.c:1137
 #9  0x0829768f in execute_one_pass (pass=0x8823060) at
 ../../gcc/gcc/passes.c:1125
 #10 0x0829788f in execute_pass_list (pass=0x8823060) at
 ../../gcc/gcc/passes.c:1178
 #11 0x082978a2 in execute_pass_list (pass=0x88239a0) at
 ../../gcc/gcc/passes.c:1179
 #12 0x082978a2 in execute_pass_list (pass=0x88231a0) at
 ../../gcc/gcc/passes.c:1179
 #13 0x08375fc2 in tree_rest_of_compilation (fndecl=0xb7d66f00) at
 ../../gcc/gcc/tree-optimize.c:406
 #14 0x084e8da0 in cgraph_expand_function (node=0xb7d66f80) at
 ../../gcc/gcc/cgraphunit.c:1073
 #15 0x084eb500 in cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1142
 #16 0x0805ddd6 in c_write_global_declarations () at 
 ../../gcc/gcc/c-decl.c:7898
 #17 0x0831996f in toplev_main (argc=17, argv=0xbfced724) at
 ../../gcc/gcc/toplev.c:1057
 #18 0x080da95f in main (argc=-1210635284, argv=0xb7d72c24) at
 ../../gcc/gcc/main.c:35

Sorry, missed this info:
The testcase is very simple:
void main1(int *arr, int n, int a, int b)
{
  int i;
  for (i = 0; i  n; i++)
{
  int m = arr[i];
  arr[i] = (m  a ? m-a : b);
 }
}

It fails while trying to delete a basic-block that is unnecessary after
tree-if-conversion (on the dump command before the deletion).
2 comments:
1. It doesn't happen without the '-details' (-fdump-tree-all-details).
2. It fails only with -ftree-vectorize on (this is the only way to turn on the
tree-if-conversion).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32821



[Bug tree-optimization/32821] tree-if-conv:combine_blocks with -ftree-dump-tree-all-details fails on ICE in compilation: segfault

2007-07-19 Thread tehila at il dot ibm dot com


--- Comment #2 from tehila at il dot ibm dot com  2007-07-19 13:51 ---
(In reply to comment #1)
I've just tried to comment out the code:
if (dump_flags  TDF_DETAILS)
{
  dump_bb (bb, dump_file, 0);
  fprintf (dump_file, \n);
}
from tree-cfg.c (at the beginning of remove_bb function, lines: 1962-1966).
Without this code it compiles OK.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32821



[Bug tree-optimization/32821] tree-if-conv:combine_blocks with -ftree-dump-tree-all-details fails on ICE in compilation: segfault

2007-07-19 Thread tehila at il dot ibm dot com


--- Comment #4 from tehila at il dot ibm dot com  2007-07-19 14:15 ---
 No, it ICEs when empty BB is to be pretty-printed. A tree pretty-printer 
 should
 be fixed/updated for this situation, this is all this PR is about.

Thanks for the quick response.
You're right, since the if-conversion cleans the BB before deleting it
( Remove labels and make stmts member of loop-header.).
Any way, does anyone see this problem in other passes, besides tree-if-conv?
Might be it's the only pass to expose this problem? 
And also, why do we need to (pretty-)print empty BB? 
I guess we can solve this problem either by changing this dump in pretty-print
(or in the tree-cfg dumps) or have another similar function of remove_bb (like
remove_empty_bb).


-- 

tehila at il dot ibm dot com changed:

   What|Removed |Added

 CC||tehila at il dot ibm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32821



[Bug middle-end/31055] New: missed auto-vectorization optimization, when there is float to int conversion

2007-03-06 Thread tehila at il dot ibm dot com
On powerpc, current auto-vectorizer does not vectorize loops that have
conversion from float to int statement in it.
For example, in case we have the following program:

#include stdarg.h

#define N 32

int main1 ()
{
  int i;
  int ib[N]; 
  float fa[N] =
{0.2,3.1,6.7,6.9,9.8,12.3,15.4,18.9,21.0,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42};

  /* int - float */
  for (i = 0; i  N; i++)
{
  ib[i] = (int) fa[i];  
}

/* check results:  */
  for (i = 0; i  N; i++)
{
  if (ib[i] != (int)fa[i]) 
abort (); 
}   

  return 0;
}

The vectorizer output is:
not vectorized: relevant stmt not supported: D.2488_7 = (int) D.2487_6
since there is no target hook for float to int conversion in rs6000 (altivec).


-- 
   Summary: missed auto-vectorization optimization, when there is
float to int conversion
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: tehila at il dot ibm dot com
 GCC build triplet: powerpc-suse-linux
  GCC host triplet: powerpc-suse-linux
GCC target triplet: powerpc-suse-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31055



[Bug tree-optimization/24659] Conversions are not vectorized

2007-01-07 Thread tehila at il dot ibm dot com


--- Comment #7 from tehila at il dot ibm dot com  2007-01-07 08:03 ---
Right, the vectorizer currently supports conversions only between integral
types. Support for type conversions that involve also floating-point types are
in the works.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24659