[Bug tree-optimization/23049] [4.1 Regression] ICE with -O3 -ftree-vectorize on 4.1.x

2005-09-17 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-09-17 19:31 ---
Please fix the caller who is not folding the condition in the first place 
instead.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23049


[Bug tree-optimization/23928] Exceptions require an excessive amount of compile-time memory

2005-09-17 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-09-17 18:43 ---
Extra ggc_collect after each optimize_inline_calls does not help reduce it 
further.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23928


[Bug tree-optimization/23928] Exceptions require an excessive amount of compile-time memory

2005-09-17 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-09-17 18:00 ---
eh-complexity patch from

http://gcc.gnu.org/ml/gcc-patches/2005-06/msg01052.html

slightly edited to apply (and approved by GeoffK in june) helps:

peak memory usage is down to 1.2GB.

 garbage collection:  17.32 ( 6%) usr   0.84 (11%) sys  18.19 ( 6%) wall   
   0 kB ( 0%) ggc
 integration   :  29.85 (10%) usr   0.86 (11%) sys  30.90 (10%) wall
2695445 kB (234%) ggc
 tree PTA  :  15.98 ( 6%) usr   0.23 ( 3%) sys  15.50 ( 5%) wall  
59710 kB ( 5%) ggc
 tree alias analysis   :  11.35 ( 4%) usr   0.51 ( 7%) sys  12.34 ( 4%) wall  
95003 kB ( 8%) ggc
 tree PHI insertion:   2.19 ( 1%) usr   0.02 ( 0%) sys   2.26 ( 1%) wall  
35414 kB ( 3%) ggc
 tree SSA rewrite  :  12.47 ( 4%) usr   0.11 ( 1%) sys  12.80 ( 4%) wall 
203797 kB (18%) ggc
 tree SSA other:   1.81 ( 1%) usr   0.26 ( 3%) sys   2.03 ( 1%) wall   
2499 kB ( 0%) ggc
 tree SSA incremental  :  24.40 ( 8%) usr   0.10 ( 1%) sys  24.65 ( 8%) wall  
64150 kB ( 6%) ggc
 tree operand scan :   9.95 ( 3%) usr   1.00 (13%) sys  11.09 ( 4%) wall 
116251 kB (10%) ggc
 dominator optimization:  11.43 ( 4%) usr   0.07 ( 1%) sys  11.36 ( 4%) wall 
168489 kB (15%) ggc
 TOTAL : 288.38 7.62   297.15   
1154283 kB


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23928


[Bug tree-optimization/23928] Exceptions require an excessive amount of compile-time memory

2005-09-17 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-09-17 17:07 ---
ipa-eh patch from

http://gcc.gnu.org/ml/gcc-patches/2005-09/msg00881.html

(with fix) does not really help.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23928


[Bug tree-optimization/23928] New: Exceptions require an excessive amount of compile-time memory

2005-09-17 Thread rguenth at tat dot physik dot uni-tuebingen dot de
The tramp3d-v4.cpp testcase with flatten (aka leafify) enabled requires
excessive amount of memory for the compile, if exceptions are not
disabled via -fno-exceptions.

Compiling with -O2 -Dleafify=flatten -fno-exceptions mainline needs
at max. 670MB of ram, while omitting -fno-exceptions it tops out at
2.7GB(!).
 
Testing was done on x86_64 with 8GB ram to avoid hitting swap. ggc
params are --param ggc-min-expand=100 --param ggc-min-heapsize=131072.

The tramp3d-v4.cpp testcase is available from
http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/tramp3d-v4.cpp.gz

-ftime-report from the -fexceptions run shows

Execution times (seconds)
 garbage collection:  19.16 ( 4%) usr   1.10 (11%) sys  20.33 ( 4%) wall   
   0 kB ( 0%) ggc
...
 integration   : 188.01 (41%) usr   2.53 (26%) sys 191.29 (40%) wall 
842654 kB (24%) ggc
...
 tree CFG cleanup  :  10.57 ( 2%) usr   0.05 ( 1%) sys  10.69 ( 2%) wall  
33061 kB ( 1%) ggc
 tree VRP  :   5.18 ( 1%) usr   0.14 ( 1%) sys   5.14 ( 1%) wall  
40349 kB ( 1%) ggc
 tree copy propagation :   5.46 ( 1%) usr   0.09 ( 1%) sys   5.56 ( 1%) wall   
5073 kB ( 0%) ggc
 tree store copy prop  :   1.10 ( 0%) usr   0.02 ( 0%) sys   0.97 ( 0%) wall   
1015 kB ( 0%) ggc
 tree find ref. vars   :   3.96 ( 1%) usr   0.05 ( 1%) sys   4.06 ( 1%) wall 
150561 kB ( 4%) ggc
 tree PTA  :  17.47 ( 4%) usr   0.29 ( 3%) sys  17.45 ( 4%) wall  
59716 kB ( 2%) ggc
 tree alias analysis   :  12.44 ( 3%) usr   0.61 ( 6%) sys  12.84 ( 3%) wall  
95403 kB ( 3%) ggc
 tree PHI insertion:   2.25 ( 0%) usr   0.02 ( 0%) sys   2.49 ( 1%) wall  
35414 kB ( 1%) ggc
 tree SSA rewrite  :  11.87 ( 3%) usr   0.04 ( 0%) sys  11.91 ( 3%) wall 
203499 kB ( 6%) ggc
 tree SSA other:   2.02 ( 0%) usr   0.22 ( 2%) sys   2.46 ( 1%) wall   
2499 kB ( 0%) ggc
 tree SSA incremental  :  25.40 ( 6%) usr   0.18 ( 2%) sys  26.07 ( 6%) wall  
63750 kB ( 2%) ggc
 tree operand scan :  10.79 ( 2%) usr   1.18 (12%) sys  12.01 ( 3%) wall 
116147 kB ( 3%) ggc
 dominator optimization:  11.64 ( 3%) usr   0.08 ( 1%) sys  12.08 ( 3%) wall 
168798 kB ( 5%) ggc
...
 expand:  15.71 ( 3%) usr   0.07 ( 1%) sys  15.54 ( 3%) wall 
194871 kB ( 6%) ggc
...
 TOTAL : 461.33 9.78   473.07   
3503243 kB

-- 
   Summary: Exceptions require an excessive amount of compile-time
memory
   Product: gcc
   Version: 4.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: rguenth at tat dot physik dot uni-tuebingen dot de
CC: gcc-bugs at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23928


[Bug middle-end/23925] HDF5 check fails--type conversions

2005-09-17 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-09-17 14:17 ---
Please provide -fno-strict-aliasing with the build CFLAGS.  I bugged the
Debian people to do this once, and this fixed all such issues.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23925


[Bug c++/23372] [4.0/4.1 Regression] Temporary aggregate copy not elided when passing parameters by value

2005-08-13 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-08-13 18:16 ---
Indeed - adding a destructor (or anything else that makes it a non-POD) "fixes"
the problem, too.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23372


[Bug c++/23372] [4.0/4.1 Regression] Temporary aggregate copy not elided when passing parameters by value

2005-08-13 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-08-13 18:11 ---
With the copy ctor we end up with

void g(A*) (a)
{
  struct A D.1603;

:
  __comp_ctor  (&D.1603, a);
  f (&D.1603);
  return;

}

which confuses me a bit, because here the prototype of f looks like
effectively

void f(A*);

do we use ABI information here, but not in the other case?  The C++
frontend in this case presents us with

{
  <
  D.1603 >>>
>) >>>
>>;
}

where in the case w/o the copy ctor we have

  <>) >>>
>>;

is there some different wording about by-value parameter passing
with or without explicit copy ctor in the C++ standard?!  I.e., why
isn't the above

  <>) >>>
>>;

?

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23372


[Bug c++/23372] Temporary aggregate copy not elided when passing parameters by value

2005-08-13 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-08-13 14:17 ---
The problem is, we end up with

void g(A*) (a)
{
  struct A D.1608;

:
  D.1608 = *a;
  f (D.1608) [tail call];
  return;

}

after the tree optimizers.  f (*a) would not be gimple, so we create
the temporary in the first place.  TER does not remove this wart,
neither does expand - so we start with two memcpys after RTL expansion.

This is definitively different from PR16405.

-- 
   What|Removed |Added

 CC||rguenth at gcc dot gnu dot
   ||org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23372


[Bug tree-optimization/22548] Aliasing can not tell array members apart

2005-08-12 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-08-12 13:02 ---
Subject: Re:  Aliasing can not tell array members
 apart

On 12 Aug 2005, giovannibajo at libero dot it wrote:

> Can you document what's the compile-time effect of raising salias-max-array-
> elements? For instance, how much do we lose in bootstrap+tramp3d if we raise 
> it
> to 16 or even 1024?

I'll do so once I return from holidays.

Richard.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22548


[Bug tree-optimization/23326] [4.0 Regression] Wrong code from forwprop

2005-08-11 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-08-11 17:43 ---
I'll do that.  Though

+ /* If we don't have , then we cannot
+optimize this case.  */
+ if ((cond_code == NE_EXPR || cond_code == EQ_EXPR)
+ && TREE_CODE (TREE_OPERAND (cond, 1)) != INTEGER_CST)
+   continue;

should probably read

+ /* If we don't have , then we cannot
+optimize this case.  */
+ if (!((cond_code == NE_EXPR || cond_code == EQ_EXPR)
+   && TREE_CODE (TREE_OPERAND (cond, 1)) == INTEGER_CST))
+   continue;

because else we might get f.i. LE_EXPR passing through?  Maybe the little
context confuses me here, though.

I'll have a look before testing.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23326


[Bug c++/21619] [4.0/4.1 regression] __builtin_constant_p(&"Hello"[0])?1:-1 not compile-time constant

2005-06-01 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-06-01 08:16 ---
Subject: Re:  [4.0/4.1 regression] __builtin_constant_p(&"Hello"[0])?1:-1
 not compile-time constant

On 1 Jun 2005, pinskia at gcc dot gnu dot org wrote:

>
> --- Additional Comments From pinskia at gcc dot gnu dot org  2005-06-01 
> 00:31 ---
> : Search converges between 2004-08-30-trunk (#529) and 2004-08-31-trunk 
> (#530).

Top of cp/ChangeLog for these?  I point my finger at

2004-08-31  Richard Henderson  <[EMAIL PROTECTED]>

PR c++/17221
* pt.c (tsubst_expr): Move OFFSETOF_EXPR handling ...
(tsubst_copy_and_build): ... here.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21619


[Bug tree-optimization/19626] Aliasing says stores to local memory do alias

2005-04-07 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-04-07 12:50 ---
Subject: Re:  Aliasing says stores to local
 memory do alias

On 7 Apr 2005, dberlin at dberlin dot org wrote:

>
> --- Additional Comments From dberlin at gcc dot gnu dot org  2005-04-07 
> 12:48 ---
> Subject: Re:  Aliasing says stores to local
>   memory do alias
>
>
> > Other than that, struct aliasing (or just removing the casts) doesn't fix 
> > the
> > aliasing problems - though struct aliasing doesn't handle array elements at
> > the moment(?).
>
> Correct, it does not.

Ok, at least the RTL optimizers figure out that these stack locals
cannot alias.  Hope we get this for the tree optimizers, too.

Richard.

--
Richard Guenther 
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19626


[Bug tree-optimization/17863] [4.0/4.1 Regression] threefold performance loss, not inlining as much

2005-03-05 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-03-05 19:03 ---
Subject: Re:  [4.0/4.1 Regression] threefold
 performance loss, not inlining as much

steven at gcc dot gnu dot org wrote:
> --- Additional Comments From steven at gcc dot gnu dot org  2005-03-05 
> 18:49 ---
> Even with Richard Guenther's patches, the only thing that really helps is 
> setting --param large-function-growth=200, or more.  The default is 100. 

Yup, this is probably one of the testcases, where -fobey-inline would 
help.  Or of course profile directed inlining.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


[Bug middle-end/19775] [3.4/4.0 regression] sqrt(pow(x,y)) != pow(x,y*0.5) (with -ffast-math)

2005-02-07 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-02-07 13:25 ---
Fixed.

-- 
   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19775


[Bug tree-optimization/17863] [4.0 Regression] threefold performance loss, not inlining as much

2005-02-03 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-02-03 17:32 ---
Subject: Re:  [4.0 Regression] threefold performance
 loss, not inlining as much

bonzini at gcc dot gnu dot org wrote:

> To the reporter: in this case you probably want __attribute__ ((leafify)), 
> just 
> in case, though you are right in expecting the compiler to inline it.

But of course attribute leafify is not available without patching your 
gcc sources.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


[Bug middle-end/19775] [3.3 regression] sqrt(pow(x,y)) != pow(x,y*0.5)

2005-02-03 Thread rguenth at tat dot physik dot uni-tuebingen dot de


-- 
   What|Removed |Added

   Severity|normal  |critical
   Keywords||wrong-code
  Known to fail||3.4.4 4.0.0
  Known to work||3.3.5
   Priority|P2  |P1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19775


[Bug middle-end/19775] New: [3.3 regression] sqrt(pow(x,y)) != pow(x,y*0.5)

2005-02-03 Thread rguenth at tat dot physik dot uni-tuebingen dot de
This one should not abort:

#include 
#include 

int main()
{
double x = -1.0;
if (sqrt(pow(x,2)) != 1.0)
abort();
return 0;
}

but both, 3.4.4 and 4.0.0 do sqrt(pow(x,y)) -> pow(x,y*0.5)
which in this case means sqrt(1.0) -> -1.0.

Ouch.

-- 
   Summary: [3.3 regression] sqrt(pow(x,y)) != pow(x,y*0.5)
   Product: gcc
   Version: 3.4.4
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: rguenth at tat dot physik dot uni-tuebingen dot de
CC: gcc-bugs at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19775


[Bug tree-optimization/19639] Funny (horrible) code for empty destructor

2005-01-30 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-30 18:54 ---
Subject: Re:  Funny (horrible) code for empty
 destructor

pinskia at gcc dot gnu dot org wrote:
> --- Additional Comments From pinskia at gcc dot gnu dot org  2005-01-29 
> 21:19 ---
> As(In reply to comment #7)
> 
>>Or we could simply unroll the loop completely, but while SCEV finds
>>the IV as
> 
> 
> Again this is most likely because fold does not fold "&x.foo[2] - 4B" to 
> "&x.foo[0]", or someone forgets 
> to call fold on that.  I know that fold_stmt can do it.

Yeah, I can find code to fold &x.foo[i] - c * j, but not without the c * 
mult.  I'll look into this later.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19639


[Bug tree-optimization/19639] Funny (horrible) code for empty destructor

2005-01-29 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-29 21:14 ---
Or we could simply unroll the loop completely, but while SCEV finds
the IV as

(set_scalar_evolution
  (scalar = this_6)
  (scalar_evolution = {(struct Foo * const) &x.foo[2] - 4B, +, -4B}_1))
)

it does not know about the number of iterations:

(set_nb_iterations_in_loop = scev_not_known))

  # BLOCK 1 
  # PRED: 3 [100.0%]  (fallthru) 0 [100.0%]  (fallthru,exec)
Invalid sum of incoming frequencies 10258, should be 1
  # thisD.1628_1 = PHI ;
:;
  thisD.1628_6 = thisD.1628_1 - 4;
  if (thisD.1628_6 == &xD.1600.fooD.1587) goto ; else goto ;
  # SUCC: 2 [11.0%]  (loop_exit,true,exec) 3 [89.0%]  (dfs_back,false,exec)
  
  # BLOCK 3 
  # PRED: 1 [89.0%]  (dfs_back,false,exec)
:;
  goto  (); 
  # SUCC: 1 [100.0%]  (fallthru)

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19639


[Bug middle-end/19402] __builtin_powi? still missing

2005-01-28 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-28 15:29 ---
Looking into it.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19402


[Bug tree-optimization/17640] empty loop not removed after optimization

2005-01-28 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-28 14:26 ---
One patch for empty-loop removal was posted here by Zdenek
http://gcc.gnu.org/ml/gcc-patches/2004-07/msg01679.html

-- 
   What|Removed |Added

 CC||rguenth at tat dot physik
   ||dot uni-tuebingen dot de


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17640


[Bug tree-optimization/19639] Funny (horrible) code for empty destructor

2005-01-28 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-28 14:21 ---
Folding &x.foo[2] == &x.foo to false does not help the testcase, as fold
never sees this comparison.  Instead the initial code the C++ frontend
creates for ctor and dtor of arrays contains temporaries for these already.
It seems the C++ frontend tries to be clever here, creating pointer IVs for
the loop and doing too much manual optimizing.

What other pass than fold() is supposed to handle this sort of simplification?

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19639


[Bug tree-optimization/15791] fold misses that two ADDR_EXPR of an arrary obvious not equal

2005-01-27 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-27 14:53 ---
Bootstrapping and testing completed successfully, but for the testcase

int g(void)
{
   struct { int b[2]; } x;
   return &x.b[0] == &x.b[1];
}

we have lowered the comparison to

 
unit size 
align 32 symtab 0 alias set -1 precision 32 min  max 
pointer_to_this >
invariant
arg 0 
public unsigned SI size  unit size

align 32 symtab 0 alias set -1>
invariant
arg 0 
invariant
arg 0 
arg 0  arg 1 >>>
arg 1 
invariant
arg 0 
invariant
arg 0 
invariant
arg 0 
arg 0  arg 1 >>>
arg 1 >>

and what confuses is the extra(?) nop_exprs - can I somehow avoid adding another
path for this case?

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15791


[Bug tree-optimization/15791] fold misses that two ADDR_EXPR of an arrary obvious not equal

2005-01-26 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-26 18:03 ---
Fails without the patch, too, with the same error.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15791


[Bug tree-optimization/15791] fold misses that two ADDR_EXPR of an arrary obvious not equal

2005-01-26 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-26 17:24 ---
Hmm, it seems it causes

stage1/xgcc -Bstage1/ -B/usr/local/i686-pc-linux-gnu/bin/ -c   -O2 -g
-fomit-frame-pointer -DIN_GCC   -W -Wall -Wwrite-strings -Wstrict-prototypes
-Wmissing-prototypes -pedantic -Wno-long-long -Wno-variadic-macros
-Wold-style-definition -Werror -fno-common   -DHAVE_CONFIG_H-I. -I.
-I/home/rguenth/src/gcc/gcc4.0/gcc -I/home/rguenth/src/gcc/gcc4.0/gcc/.
-I/home/rguenth/src/gcc/gcc4.0/gcc/../include
-I/home/rguenth/src/gcc/gcc4.0/gcc/../libcpp/include 
/home/rguenth/src/gcc/gcc4.0/gcc/ggc-page.c -o ggc-page.o
/home/rguenth/src/gcc/gcc4.0/gcc/ggc-page.c: In function 'ggc_pch_read':
/home/rguenth/src/gcc/gcc4.0/gcc/ggc-page.c:2304: internal compiler error:
Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html> for instructions.

#0  0x081da08c in tsi_stmt (i={ptr = 0x0, container = 0x40798d50})
at /home/rguenth/src/gcc/gcc4.0/gcc/tree-iterator.h:93
#1  0x081da5a6 in bsi_stmt (i=
  {tsi = {ptr = 0x0, container = 0x40798d50}, bb = 0x401d4360})
at /home/rguenth/src/gcc/gcc4.0/gcc/tree-flow-inline.h:572
#2  0x081cb6b4 in stmt_after_ip_original_pos (cand=0x88104f8, stmt=0x40832a00)
at /home/rguenth/src/gcc/gcc4.0/gcc/tree-ssa-loop-ivopts.c:613
#3  0x081cb751 in stmt_after_increment (loop=, 
cand=0x88104f8, stmt=0x40832a00)
at /home/rguenth/src/gcc/gcc4.0/gcc/tree-ssa-loop-ivopts.c:635

no time to investigate - maybe an unrelated problem (didn't check if bootstrap
succeeds without patch).

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15791


[Bug tree-optimization/15791] fold misses that two ADDR_EXPR of an arrary obvious not equal

2005-01-26 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-26 16:16 ---
Umm, no.  We fold the ARRAY_REF comparison to

PLUS_EXPR(ADDR_EXPR, INTEGER_CST) == PLUS_EXPR(ADDR_EXPR, INTEGER_CST)

oh well ;)  So I guess transforming &a + i truth_op &a + j to i truth_op j
is always correct, as &a - &a == 0.

For &b[1] == b though, we'll have to do more checks for this.

Patch attached, bootstrap and testing in progress.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15791


[Bug tree-optimization/15791] fold misses that two ADDR_EXPR of an arrary obvious not equal

2005-01-26 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-26 15:30 ---
Ok - I guess it's ARRAY_REFs that are not folded ;)  So the summary could be

"fold misses that two ARRAY_REFs with different offset of the same arrary are
obviously not equal".

But I'm not allowed to change that.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15791


[Bug tree-optimization/15791] fold misses that two ADDR_EXPR of an arrary obvious not equal

2005-01-26 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-26 14:54 ---
Subject: Re:  fold misses that two ADDR_EXPR of
 an arrary obvious not equal

On 26 Jan 2005, pinskia at gcc dot gnu dot org wrote:

> (In reply to comment #5)
> > Could we, in general, fold  &a[i] TRUTHOP &a[j] to i TRUTHOP j?  I guess the
> > only special case would be for sizeof(a[i]) == 0 -- but that is not allowed
> > by the standard?  I'll be wading through fold tomorrow and look where to add
> > this transformation.
> sizeof(a[i]) can be zero for other languages besides C++ (C for an example).
> I gave you an hint where this can be fixed by the coment :).

Apart from this, the following should fix it (while bootstrapping I'll
search for truthcode_p() and a way to test the type size):

Index: fold-const.c
===
RCS file: /cvs/gcc/gcc/gcc/fold-const.c,v
retrieving revision 1.497
diff -u -r1.497 fold-const.c
--- fold-const.c23 Jan 2005 15:05:29 -  1.497
+++ fold-const.c26 Jan 2005 14:53:38 -
@@ -8245,6 +8245,15 @@
  ? code == EQ_EXPR : code != EQ_EXPR,
  type);

+  /* If this is a comparison of two ADDR_EXPRs of the same object
+ and the objects size is not zero, then we can fold this to
+a comparison of the two offsets.  */
+  if ((code == EQ_EXPR || code == NE_EXPR /* FIXME: rest */)
+ && TREE_CODE (arg0) == ADDR_EXPR
+ && TREE_CODE (arg1) == ADDR_EXPR
+ && operand_equal_p (arg0, arg1, 0))
+   return fold (build2 (code, type, TREE_OPERAND (arg0, 1), TREE_OPERAND 
(arg1, 0)));
+
   if (FLOAT_TYPE_P (TREE_TYPE (arg0)))
{
  tree targ0 = strip_float_extensions (arg0);



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15791


[Bug tree-optimization/15791] fold misses that two ADDR_EXPR of an arrary obvious not equal

2005-01-26 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-26 14:35 ---
Could we, in general, fold  &a[i] TRUTHOP &a[j] to i TRUTHOP j?  I guess the
only special case would be for sizeof(a[i]) == 0 -- but that is not allowed
by the standard?  I'll be wading through fold tomorrow and look where to add
this transformation.

-- 
   What|Removed |Added

 CC|        |rguenth at tat dot physik
   |        |dot uni-tuebingen dot de


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15791


[Bug tree-optimization/19639] Funny (horrible) code for empty destructor

2005-01-26 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-26 14:10 ---
We can also not fold &i[0] == &i[1] to false in

int foo(void)
{
int i[2];
if (&i[0] == &i[1])
return 1;
return 0;
}

or i+0 == i+1 which is transformed to &i[0] == &i[1].

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19639


[Bug tree-optimization/19639] New: Funny (horrible) code for empty destructor

2005-01-26 Thread rguenth at tat dot physik dot uni-tuebingen dot de
The following simple testcase

struct Foo { ~Foo() {} int i; };
struct NonPod { Foo foo[2]; };
void foo(void)
{
NonPod x;
}

produces(!) at -O2

_Z3foov:
.LFB5:
pushl   %ebp
.LCFI0:
movl%esp, %ebp
.LCFI1:
subl$16, %esp
.LCFI2:
leal-2(%ebp), %edx
movl%ebp, %eax
.p2align 4,,15
.L4:
decl%eax
cmpl%edx, %eax
jne .L4
leave
ret

yay!  Looking at the optimized tree-dump, it contains a funny loop:

void foo() ()
{
  struct Foo * const this;
  register struct Foo * D.1621;
  struct Foo[2] * D.1620;
  struct NonPod x;

:
  if (&x.foo[2] == &x.foo) goto ; else goto ;

:;
  this = &x.foo[2];

:;
  this = this - 1;
  if (this == &x.foo) goto ; else goto ;

:;
  return;

}

which is roughly what is generated initially by the C++ frontend
for the dtor:

;; Function NonPod::~NonPod() (_ZN6NonPodD1Ev *INTERNAL* )
;; enabled by -tree-original

{
  <<< Unknown tree: if_stmt  
  1

   >>>
;
  try
{

}
  finally
{
  {
register struct Foo * D.1599;

(if (&((struct NonPod *) this)->foo != 0B)
  {
(void) (D.1599 = &((struct NonPod *) this)->foo + 2);
while (1)
  {
if (&((struct NonPod *) this)->foo == D.1599) break;
(void) (D.1599 = D.1599 - 1);;
__comp_dtor  (NON_LVALUE_EXPR );;
  };
  }
else
  {
0
  });
  }
}
}
:;

Note the same happens for empty struct Foo, but even avoiding
the ambiguous(?) &this->foo[2] - &this->foo[1] doesn't help.

The RTL unroller, if enabled, gets rid of the most ugly stuff from above,
but appearantly the tree loop optimizer does not know how to handle this
loop.

_Z3foov:
.LFB5:
pushl   %ebp
.LCFI0:
movl%esp, %ebp
.LCFI1:
subl$16, %esp
.LCFI2:
movl%ebp, %esp
popl%ebp
ret

-- 
   Summary: Funny (horrible) code for empty destructor
   Product: gcc
   Version: 4.0.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: tree-optimization
    AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: rguenth at tat dot physik dot uni-tuebingen dot de
CC: gcc-bugs at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19639


[Bug tree-optimization/19637] New: Missed constant propagation with placement new

2005-01-26 Thread rguenth at tat dot physik dot uni-tuebingen dot de
For the following testcase with three similar functions we do different
tree optimizations:

#include 

struct Foo {
Foo() { i[0] = 1; }
int i[2];
};

int foo_char(void)
{
int i[2];
new (reinterpret_cast(i)) Foo();
return reinterpret_cast(i)->i[0];
}

int foo_void(void)
{
int i[2];
new (reinterpret_cast(i)) Foo();
return reinterpret_cast(i)->i[0];
}

int foo_void_offset(void)
{
int i[2];
new (reinterpret_cast(&i[0])) Foo();
return reinterpret_cast(&i[0])->i[0];
}

We only can optimize the foo_void_offset() variant to return 1, the
foo_void() variant results in

:
  this = (struct Foo *) &i[0];
  this->i[0] = 1;
  i.6 = (struct Foo *) &i;
  return i.6->i[0];

where the difference starts in what the frontend produces:

  (void) (TARGET_EXPR ;
and return  = ((struct Foo *) &i[0])->i[0];

vs.   (void) (TARGET_EXPR ;
and return  = ((struct Foo *) (int *) &i)->i[0];

note that mixing &i[0] and i does not allow folding.


For the char* variant we even cannot prove that &i is non-null (!?):

:
  i.2 = (char *) &i;
  __p = i.2;
  this = (struct Foo *) __p;
  if (__p != 0B) goto ; else goto ;

:;
  this->i[0] = 1;

:;
  i.4 = (struct Foo *) &i;
  return i.4->i[0];

though this might be somehow related to type-based aliasing rules(?).
Note that the char variant does not care if &i[0] or plain i is specified.

-- 
   Summary: Missed constant propagation with placement new
   Product: gcc
   Version: 4.0.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: tree-optimization
    AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: rguenth at tat dot physik dot uni-tuebingen dot de
CC: gcc-bugs at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19637


[Bug middle-end/13776] [4.0 Regression] Many C++ compile-time regressions for MICO's ORB code

2005-01-26 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-26 10:24 ---
Subject: Re:  [4.0 Regression] Many C++ compile-time
 regressions for MICO's ORB code

> Bah, I hate profiles for "cc1plus -O2 ir.ii" without peaks:
>
> CPU: P4 / Xeon with 2 hyper-threads, speed 3194.17 MHz (estimated)
> Counted GLOBAL_POWER_EVENTS events (time during which processor is not
> stopped) with a unit mask of 0x01 (mandatory) count 10
> samples  %symbol name
> 25018 1.6858  walk_tree
> 24322 1.6389  cgraph_node_for_asm
> 19586 1.3198  htab_find_slot_with_hash

Do you have numbers wether we are memory-bandwith limited here?  If
not, we might micro-optimize hash table access somewhat more.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13776


[Bug tree-optimization/19626] Aliasing says stores to local memory do alias

2005-01-26 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-26 08:47 ---
Subject: Re:  Aliasing says stores to local
 memory do alias

>   D.2540 = (struct Loc<1> *) &dX.D.2210.D.2166.domain_m.buffer;
> That confuses the aliasing mechanism
> buffer is of type int* but you are casting it to Loc<1> *.

Telling it the truth by having an array of Loc<1> instead doesn't help.
I suppose you're talking about not decomposing Loc<2> into two
Loc<1> as intermediate step?  Well, yes, that's a design decision I
cannot change.  It looks superfluous for Loc<>, but makes sense for
the more complex domain objects like Interval and Range (but that's
a different story).

But in principle a compiler could determine that the two objects
cannot alias, even which this interwinded type structure?



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19626


[Bug tree-optimization/19626] Aliasing says stores to local memory do alias

2005-01-25 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-25 16:57 ---
Created an attachment (id=8062)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=8062&action=view)
testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19626


[Bug tree-optimization/19626] New: Aliasing says stores to local memory do alias

2005-01-25 Thread rguenth at tat dot physik dot uni-tuebingen dot de
Given the attached testcase, for reference, the interesting function is
this:

int loc_test(void)
{
const Loc<2> dX(1, 0);
const Loc<2> k(0, 1);
return k[0].first() + dX[0].first();
}

aliasing tells us that the initializations of dX and k alias each
other:

:
  D.2540 = (struct Loc<1> *) &dX.D.2210.D.2166.domain_m.buffer;
  #   dX_357 = V_MAY_DEF ;
  #   k_358 = V_MAY_DEF ;
  *&(&D.2540->D.2094)->D.2057.domain_m = 1;
  #   dX_365 = V_MAY_DEF ;
  #   k_364 = V_MAY_DEF ;
  *&(&(D.2540 + 4B)->D.2094)->D.2057.domain_m = 0;
  D.2682 = (struct Loc<1> *) &k.D.2210.D.2166.domain_m.buffer;
  #   dX_337 = V_MAY_DEF ;
  #   k_338 = V_MAY_DEF ;
  *&(&D.2682->D.2094)->D.2057.domain_m = 0;
  #   dX_361 = V_MAY_DEF ;
  #   k_63 = V_MAY_DEF ;
  *&(&(D.2682 + 4B)->D.2094)->D.2057.domain_m = 1;
  D.2769 = (struct Loc<1> *) &k.D.2210.D.2166.domain_m.buffer;
  D.2791 = (struct Loc<1> *) &dX.D.2210.D.2166.domain_m.buffer;
  return (&D.2769->D.2094)->D.2057.domain_m + 
(&D.2791->D.2094)->D.2057.domain_m;

which is of course (trivially) not true.  This may be obfuscated by
the actual implementation of the template class Loc (see attached
complete testcase).

At the RTL level we are able to optimize this to just return 1, as
expected.  This pessimizes tree loop optimizations if such constructs
are used inside a loop and as induction variable.

-- 
   Summary: Aliasing says stores to local memory do alias
   Product: gcc
   Version: 4.0.0
Status: UNCONFIRMED
  Severity: normal
      Priority: P2
     Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: rguenth at tat dot physik dot uni-tuebingen dot de
CC: gcc-bugs at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19626


[Bug tree-optimization/19624] PRE pessimizes ivopts

2005-01-25 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-25 15:27 ---
I guess making PRE and ivopts playing nicely together perfectly is near to
impossible - but any improvement in the 4.0 timeframe is welcome!

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19624


[Bug tree-optimization/19624] PRE pessimizes ivopts

2005-01-25 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-25 14:52 ---
Oh, in principle this should compile to roughly the same as

void c_test(double *a, double *b, int ei, int ej, int stridea, int strideb)
{
  for (int j=0; jhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=19624


[Bug tree-optimization/19624] PRE pessimizes ivopts

2005-01-25 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-25 14:45 ---
Created an attachment (id=8060)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=8060&action=view)
testcase

The testcase is reduced from a complex POOMA program.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19624


[Bug tree-optimization/19624] New: PRE pessimizes ivopts

2005-01-25 Thread rguenth at tat dot physik dot uni-tuebingen dot de
The attached testcase is pessimized by PRE.  Be sure to get tree-level complete
loop unrolling enabled, f.i. with -O2 -funroll-loops with current mainline.

With PRE, a lot less computations are hoisted out of the inner loop.  Note this
is not a regression to 3.4, which is not able to decompose Loc 
appropriately
or avoid instantiating temporary objects of this type.

-- 
   Summary: PRE pessimizes ivopts
   Product: gcc
   Version: 4.0.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: rguenth at tat dot physik dot uni-tuebingen dot de
CC: gcc-bugs at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19624


[Bug tree-optimization/19401] Trivial loop not unrolled

2005-01-24 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-24 09:43 ---
Another one - matrix multiplication:

/* A [NxM], B [MxP] */
#define DOLOOP(N, M, P) \
void matmul ## N ## M ## P(double *res, const double *A, const double *B) \
{ \
int i,j,k; \
for (k=0; khttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=19401


[Bug tree-optimization/19516] missed optimization (bool)

2005-01-23 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-23 11:13 ---
How comes, that if I change _Bool to int, after tree-optimizations we get

foo (flag)
{
  int D.1121;

:
  D.1121_2 = *flag_1;
  if (D.1121_2 != 0) goto ; else goto ;

:;
  bar ();
  D.1121_11 = *flag_1;
  if (D.1121_11 != 0) goto ; else goto ;

:;
  bar () [tail call];

:;
  return;

}

If your analysis were correct, this shouldn't be possible, no?

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19516


[Bug tree-optimization/18754] unrolling happens too late/SRA does not happen late enough

2005-01-21 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-21 16:07 ---
Experimenting with SRA inside loop together with cleanup passes after
cunroll/sra didn't reveal anything good - even with loop cfg_cleanup patched in.
 See thread starting at http://gcc.gnu.org/ml/gcc-patches/2005-01/msg01315.html

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18754


[Bug tree-optimization/18754] unrolling happens too late/SRA does not happen late enough

2005-01-20 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-20 15:15 ---
Subject: Re:  unrolling happens too late/SRA
 does not happen late enough

On 20 Jan 2005, dberlin at dberlin dot org wrote:

> Wiat, why are we running SRA twice again at all?
> I can't figure this out from the bug report, other than seeing that we
> "could sra c.array", but i don't see why that requires a loop opt first.

We don't run sra twice.  But an early loop unrolling will change f.i.

  for (unsigned int d=0; d<4; ++d)
c.array[d] = a.array[d] * b.array[d];

to

c.array[0] = a.array[0] * b.array[0];
c.array[1] = a.array[1] * b.array[1];
c.array[2] = a.array[2] * b.array[2];
c.array[3] = a.array[3] * b.array[3];

and SRA can only scalarize this variant, not if the loop is still
there.  That's the whole point of the loop<->sra ordering problem.
And of course sra may then expose new interesting choices for iv's
of outer loops - at least I think.

Richard.

--
Richard Guenther 
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18754


[Bug tree-optimization/18754] unrolling happens too late/SRA does not happen late enough

2005-01-20 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-20 14:57 ---
Subject: Re:  unrolling happens too late/SRA
 does not happen late enough

> Note PR 18755 blocks this if we go the SRA after loop optimization which
> seems like a better idea.

I do not completely understand this sentence ;)  I argue that SRA after
loop is a bad idea, because SRA, in my testcases, will expose new
oportunities for selecting ivs, so we'll need to run another loop after
SRA.  So I chose for

  loop0
  sra
  loop

instead of

  sra
  loop
  sra
  loop

which is one pass less.  Also with -ftree-early-loop-optimize we get
in .vars for PR18755:

;; Function float foobar() (_Z6foobarv)

float foobar() ()
{
:
  return a.array[3] * b.array[3] + b.array[2] * a.array[2] + b.array[1] *
a.array[1] + a.array[0] * b.array[0] + 0.0;

}

which is what we want?  Or do we now just paper over another problem here?

I'm confused...

Richard.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18754


[Bug tree-optimization/18754] unrolling happens too late/SRA does not happen late enough

2005-01-20 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-20 10:57 ---
This is also somewhat related to PR19401 as we do not unroll loops completely
with just -O2 at the moment, which is important for the second testcase.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18754


[Bug tree-optimization/19507] missed tree-optimization

2005-01-18 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-18 22:29 ---
Done. PR19516.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19507


[Bug tree-optimization/19516] New: missed optimization

2005-01-18 Thread rguenth at tat dot physik dot uni-tuebingen dot de
Actually a side-bug of 19507.  The testcase

void bar(void);

void foo(const _Bool *flag)
{
if (*flag)
bar();
if (*flag)
bar();
}

Should be transformed to (at the tree level):

if (!*flag)
   return;
bar();
if (*flag)
   bar();

this is only done at the RTL level at the moment.

Andrew Pinski reports this works, if we exchange _Bool
for int/short/char.

-- 
   Summary: missed optimization
   Product: gcc
   Version: 4.0.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: rguenth at tat dot physik dot uni-tuebingen dot de
CC: gcc-bugs at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19516


[Bug tree-optimization/19507] missed tree-optimization

2005-01-18 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-18 20:10 ---
Subject: Re:  missed tree-optimization

pinskia at gcc dot gnu dot org wrote:
> --- Additional Comments From pinskia at gcc dot gnu dot org  2005-01-18 
> 20:06 ---
> (In reply to comment #1)
> 
>>A C testcase with the missing jump threading(?):
>>
>>void bar(void);
>>
>>void foo(const _Bool *flag)
>>{
>>if (*flag)
>>bar();
>>if (*flag)
>>bar();
>>}
> 
> 
> No this one cannot be optimizated because we can change what is in flag in 
> bar();

I meant this should be transformed to

if (!*flag)
   return;
bar();
if (*flag)
   bar();

this is done at RTL level, but not at tree level.  I should file a 
separate bug for this one, really.

Richard.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19507


[Bug tree-optimization/19507] missed tree-optimization

2005-01-18 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-18 16:39 ---
A C testcase with the missing jump threading(?):

void bar(void);

void foo(const _Bool *flag)
{
if (*flag)
bar();
if (*flag)
bar();
}


a testcase where we able to thread the jump:

extern long int random(void);

void foo(void)
{
long int i = random();
if (i)
i = random();
if (i)
i = random();
}


the difference seems to be we use .GLOBAL_VAR_10 = V_MAY_DEF <.GLOBAL_VAR_9>;
in the latter while we use TMT.0_9 = V_MAY_DEF ; in the former.
Though, of course, I don't know what either means.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19507


[Bug tree-optimization/19507] New: missed tree-optimization

2005-01-18 Thread rguenth at tat dot physik dot uni-tuebingen dot de
The following testcase:


class Flag {
public:
Flag(bool f) : flag(f) {}
bool test() const { return flag; }
private:
const bool flag;
};

void bar(void);

void foo(const Flag& f)
{
if (f.test())
bar();
if (f.test())
bar();
}


Should from my point of view should generate exactly one test
and optimize the redundant one.  I miss what could be a not
ill-formed way of bar() modifying Flag::flag.

With mainline -O2 -S -fdump-tree-optimized-vops we get for
t63.optimized:

:
  if (f->flag != 0) goto ; else goto ;

:;
  #   TMT.2_17 = V_MAY_DEF ;
  bar ();

:;
  if (f->flag != 0) goto ; else goto ;

:;
  #   TMT.2_16 = V_MAY_DEF ;
  bar () [tail call];

:;
  return;


The RTL optimizers exploit a valid optimization, namely:

_Z3fooRK4Flag:
.LFB6:
pushl   %ebx#
.LCFI0:
subl$8, %esp#,
.LCFI1:
movl16(%esp), %ebx  # f, f
cmpb$0, (%ebx)  # .flag
jne .L8 #,
.L6:
addl$8, %esp#,
popl%ebx#
ret
.p2align 4,,7
.L8:
call_Z3barv #
cmpb$0, (%ebx)  # .flag
.p2align 4,,4
je  .L6 #,
addl$8, %esp#,
popl%ebx#
jmp _Z3barv #


where you can see we optimized the function into the equivalent of

   if (!f.test())
 return;
   bar();
   if (!f.test())
 return;
   bar();

Who is supposed to apply the corresponding tree optimization here?

Of course, I think it is valid to omit the second test completely
as there is no valid way for bar() to change Flag::flag.

Note that this may be a frontend issue, as to the tree-optimizers
this may be no different than

void foo(const bool& f)
{
   if (f)
  bar();
   if (f)
  bar();
}

where there of course are valid ways for bar() to change f.

-- 
   Summary: missed tree-optimization
   Product: gcc
   Version: 4.0.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: rguenth at tat dot physik dot uni-tuebingen dot de
CC: gcc-bugs at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19507


[Bug middle-end/19402] __builtin_powi? still missing

2005-01-12 Thread rguenth at tat dot physik dot uni-tuebingen dot de


-- 
   What|Removed |Added

 CC||rguenth at tat dot physik
   ||dot uni-tuebingen dot de


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19402


[Bug tree-optimization/19401] Trivial loop not unrolled

2005-01-12 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-12 16:24 ---
Or stuff often found in C++ libraries:

template 
struct Vector
{
  Vector(float init)
  {
for (int i=0; ihttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=19401


[Bug tree-optimization/19401] Trivial loop not unrolled

2005-01-12 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-12 16:19 ---
In 3.4 one was able to do this by specifying -fpeel-loops and got complete loop
peeling enabled.  In 4.0 this is also the case, but only for the RTL unroller -
the tree unroller is not affected and as such _this_ unrolling does not help
PR11706.  - Just another datapoint.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19401


[Bug libstdc++/11706] std::pow(T, int) implementation pessimizes code

2005-01-12 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-12 16:17 ---
Current status is that with -O2 on mainline we generate the same
(better) code for ::pow(x, 2) and std::pow(x, 2.0) than for
std::pow(x, 2) which looses because of the lack of unrolling
(PR19401).

Also, ::pow(x, 27) and other exponents will always generate better
code than the std::pow(x, 27) variant due to the technically
superior implementation of gcc/builtins.c:expand_powi.

The attached patch solves all of these problems, unfortunately
in ways the libstdc++ maintainer(s) do not like.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11706


[Bug tree-optimization/19401] Trivial loop not unrolled

2005-01-12 Thread rguenth at tat dot physik dot uni-tuebingen dot de


-- 
   What|Removed |Added

OtherBugsDependingO||11706
  nThis||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19401


[Bug tree-optimization/19401] New: Trivial loop not unrolled

2005-01-12 Thread rguenth at tat dot physik dot uni-tuebingen dot de
We do not unroll the loop in

double foo(double __x)
{
unsigned int __n = 2;
double __y = __n % 2 ? __x : 1;

while (__n >>= 1)
  {
__x = __x * __x;
if (__n % 2)
  __y = __y * __x;
  }

return __y;
}

with -O2 which causes us to emit gratiously worse code
for std::pow(x, 2) than for std::pow(x, 2.0).

We should definitely get this right without -funroll-loops
and all its side-effects.

-- 
   Summary: Trivial loop not unrolled
   Product: gcc
   Version: 4.0.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: rguenth at tat dot physik dot uni-tuebingen dot de
CC: gcc-bugs at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19401


[Bug rtl-optimization/11707] [3.4 Regression] [new unroller] constants not propagated in unrolled loop iterations with a conditional

2005-01-12 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-12 11:05 ---
I can re-confirm that the patch moves 3.4 to the state of 3.3 - i.e. with an
extra imull compared to 2.95 and 4.0.  The patch has bootstrapped with checking
enabled and -funroll-loops on ia64, testing is in process.  I'll formally submit
the patch shortly.

For the imull regression I'll file a separate bug with a possibly reduced 
testcase.

-- 
   What|Removed |Added

 CC||rakdver at gcc dot gnu dot
   ||org
  Known to fail|3.4.0   |3.4.0 3.4.3


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11707


[Bug c++/10611] operations on vector mode not recognized in C++

2005-01-12 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-12 09:44 ---
What is the status on this issue?  I.e. +,-,*,/ on vector types for C++?  Note
that trying to work around this missing feature with operator overloading like

v4sf operator+(const v4sf& a, const v4sf& b)
{
return __builtin_ia32_addps(a, b);
}

(which would be again machine specific, but anyhow) doesn't work:

t.c:3: error: 'float __vector__ operator+(const float __vector__&, const float
__vector__&)' must have an argument of class or enumerated type.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10611


[Bug rtl-optimization/13246] [new-ra][meta-bug] new-ra related problems

2005-01-12 Thread rguenth at tat dot physik dot uni-tuebingen dot de


-- 
Bug 13246 depends on bug 10469, which changed state.

Bug 10469 Summary: constant V4SF loads get moved inside loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10469

   What|Old Value   |New Value

 Status|ASSIGNED|RESOLVED
 Resolution||FIXED

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13246


[Bug rtl-optimization/10469] constant V4SF loads get moved inside loop

2005-01-12 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2005-01-12 09:35 ---
I guess we won't ever fix this for 3.3 and new-ra is dead, so this is "fixed".

-- 
   What|Removed |Added

 Status|ASSIGNED|RESOLVED
  Known to work|3.4.1   |3.4.1 3.4.3 4.0.0
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10469


[Bug target/19131] alloca returning unnecessarily aligned pointer and uses too much memory

2004-12-23 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2004-12-23 22:23 ---
Subject: Re:  alloca returning unnecessarily aligned pointer
 and uses too much memory

pinskia at gcc dot gnu dot org wrote:
> --- Additional Comments From pinskia at gcc dot gnu dot org  2004-12-22 
> 15:06 ---
> The reason you cannot find anything in the C standard is because this is ABI 
> thing so this is invalid

Where is the ABI specified?  Is it the "System V ABI, Intel386 
Architecture Processor Supplement" document I found at
http://www.caldera.com/developers/devspecs/abi386-4.pdf?
This one talks about word-alignment of the stack, not 16 byte alignment.

> We need to keep the stack aligned sorry.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19131


[Bug target/19131] alloca returning unnecessarily aligned pointer and uses too much memory

2004-12-22 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2004-12-22 18:16 ---
Subject: Re:  alloca returning unnecessarily aligned pointer
 and uses too much memory

pinskia at gcc dot gnu dot org wrote:
> --- Additional Comments From pinskia at gcc dot gnu dot org  2004-12-22 
> 15:06 ---
> The reason you cannot find anything in the C standard is because this is ABI 
> thing so this is invalid
> 
> We need to keep the stack aligned sorry.

Inside a function!?  Or just at function callsites?  Humm, the Intel 
compiler produces

..B1.3: # Preds ..B1.2 ..B1.4
 movl  $4, %eax  #5.12
 subl  %eax, %esp#5.12
 andl  $-16, %esp#5.12
 movl  %esp, %eax#5.12
 # LOE eax ebx ebp esi edi
..B1.4: # Preds ..B1.3
 addl  (%eax), %ebx  #6.3
 addl  $1, %esi  #4.21
 cmpl  %edi, %esi#4.2
 jl..B1.3

which looks like it aligns the stack after alloca, but it manages to
waste less space by subtracting $4, not $32.

Also if the ABI says the stack is aligned, why do we not make use of 
this and avoid the andl $-16, %esp -- or is the alignment only about
alloca?

I'm a bit confused.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19131


[Bug tree-optimization/19131] New: alloca returning unnecessarily aligned pointer and uses too much memory

2004-12-22 Thread rguenth at tat dot physik dot uni-tuebingen dot de
The testcase

int foo(int bar)
{
int i, res = 0;
for (i=0; ihttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=19131


[Bug tree-optimization/18754] unrolling happens too late/SRA does not happen late enough

2004-12-16 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2004-12-16 17:08 ---
The attached patch makes us for -O3 -funroll-loops -ffast-math produce in .vars

float foobar() ()
{
:
  return a.array[3] * b.array[3] + a.array[2] * b.array[2] + a.array[0] *
b.array[0] + a.array[1] * b.array[1];

}

though the assembly is as good as before.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18754


[Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression

2004-12-07 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2004-12-07 15:35 ---
Subject: Re:  [4.0 Regression] Inlining limits
 cause 340% performance regression

On Tue, 7 Dec 2004, Richard Guenther wrote:

> static inline void foo() {}
> void bar() { foo(); }
>
> which for -O2 -fprofile-generate produces
>
> bar:
> addl$1, .LPBX1
> pushl   %ebp
> movl%esp, %ebp
> adcl$0, .LPBX1+4
> addl$1, .LPBX1+16
> popl%ebp
> adcl$0, .LPBX1+20
> addl$1, .LPBX1+8
> adcl$0, .LPBX1+12
> ret

Mainline manages to produce

bar:
addl$1, .LPBX1
pushl   %ebp
movl%esp, %ebp
adcl$0, .LPBX1+4
popl%ebp
ret

but that's RTL instrumentation?



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704


[Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression

2004-12-07 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2004-12-07 15:09 ---
Subject: Re:  [4.0 Regression] Inlining limits
 cause 340% performance regression

On 7 Dec 2004, hubicka at ucw dot cz wrote:

> > Yes, it seems so.  Really nice improvement.  Though profiling is
> > sloow.  I guess you avoid doing any CFG changing transformation
> > for the profiling stage?  I.e. not even inline the simplest functions?
>
> I can inline but only after actually instrumenting the functios.  That
> should minimize the costs, but I also noticed that tramp3d is
> surprisingly a lot slower with profiling.
>
> > That would be the reason the Intel compiler is unusable with profiling
> > for me.  -fprofile-generate comes with a 50fold increase in runtime!
>
> -fprofile-generate is actually package of
> -fprofile-arcs/-fprofile-values + -fprofile-values-transformations
> It might be interesting to figure out whether -fprofile-arcs itslef
> brings similar slowdown.  Only reason why this can happen I can think of
> is the fact that after instrumenting we again inline a lot less or we
> produce too many redundant counter.  Perhaps it would make sense to
> think about inlining functions reducing code size before instrumenting
> as we would do that anyway, but it will be tricky to get gcov output and
> -f* flags independence right then.

Hm.  There are a lot of counters - maybe it is possible to merge
the counters themselves?  The resulting asm of tramp3d-v3 consists
of 30% addl/adcl lines for adding the profiling counts - where
the total number of lines is just wc -l of a -S -fverbose-asm compilation.
That's very much a lot.  And additions are in cache unfriedly sequence,
too - dunno which optimization pass could improve this though.  Consider

static inline void foo() {}
void bar() { foo(); }

which for -O2 -fprofile-generate produces

bar:
addl$1, .LPBX1
pushl   %ebp
movl%esp, %ebp
adcl$0, .LPBX1+4
addl$1, .LPBX1+16
popl%ebp
adcl$0, .LPBX1+20
addl$1, .LPBX1+8
adcl$0, .LPBX1+12
ret

that should be

bar:
addl$1, .LPBX1
pushl   %ebp
movl%esp, %ebp
adcl$0, .LPBX1+4
addl$1, .LPBX1+8
adcl$0, .LPBX1+12
addl$1, .LPBX1+16
adcl$0, .LPBX1+20
ret

And of course all the three counters could be merged.  But that
would need a changed gcov file format somehow representing a
callgraph with merged edges.

The intel compiler is so much worse here because all the
counter adding is done thread-safe in a library (i.e. they
have an extra call for every edge and do not do any inlining).

> How our profilng performance is compared to ICC?

ICC is a lot worse.  ICC with -prof_gen causes a 1 fold slowdown
(if the current snapshot of icc doesn't segfault compiling the tramp3d
testcase) - ICC is completely unusable for me.  So - GCC is great!

> > > It would be nice to experiment with this a little - in general the
> > > heuristics can be viewed as having three players.  There are the limits
> > > (specified via --param) that it must obey, there is the cost model
> > > (estimated growth for inlining into all callees without profiling and
> > > the execute_count to estimated growth for inlining to one call with
> > > profiling) and the bin packing algorithm optimizing the gains while
> > > obeying the limits.
> > >
> > > With profiling in the cost model is pretty much realistic and it would
> > > be nice to figure out how the performance behave when the individual
> > > limits are changed and why.  If you have some time for experimentation,
> > > it would be very usefull.  I am trying to do the same with SPEC and GCC
> > > but I have dificulty to play with pooma or Gerald's application as I
> > > have little understanding what is going there.  I will try it myself
> > > next but any feedback can be very usefull here.
> >
> > I can produce some numbers for the tramp testcase.
> Thanks!  Note that with changling the flags you should not need to
> re-profile now so you can save quite a lot of time.

Ah, thats indeed nice.

Richard.

--
Richard Guenther 
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704


[Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression

2004-12-07 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2004-12-07 14:35 ---
Subject: Re:  [4.0 Regression] Inlining limits
 cause 340% performance regression

On 6 Dec 2004, hubicka at ucw dot cz wrote:

> Looks like I get 4fold speedup on tree profiling with profiling compared
> to tree profiling on mainline that is equivalent to speedup you are
> seeing for leafify patch. That sounds pretty prommising (so the new
> heuristics can get the leafify idea without the hint from user and
> hitting the code growth problems).

Yes, it seems so.  Really nice improvement.  Though profiling is
sloow.  I guess you avoid doing any CFG changing transformation
for the profiling stage?  I.e. not even inline the simplest functions?
That would be the reason the Intel compiler is unusable with profiling
for me.  -fprofile-generate comes with a 50fold increase in runtime!

> It would be nice to experiment with this a little - in general the
> heuristics can be viewed as having three players.  There are the limits
> (specified via --param) that it must obey, there is the cost model
> (estimated growth for inlining into all callees without profiling and
> the execute_count to estimated growth for inlining to one call with
> profiling) and the bin packing algorithm optimizing the gains while
> obeying the limits.
>
> With profiling in the cost model is pretty much realistic and it would
> be nice to figure out how the performance behave when the individual
> limits are changed and why.  If you have some time for experimentation,
> it would be very usefull.  I am trying to do the same with SPEC and GCC
> but I have dificulty to play with pooma or Gerald's application as I
> have little understanding what is going there.  I will try it myself
> next but any feedback can be very usefull here.

I can produce some numbers for the tramp testcase.

> My plan is to try undersand the limits first and then try to get the
> cost model better without profiling as it is bit too clumpsy to do both
> at once.

Do you have some written overview of the cost model?

Richard.

--
Richard Guenther 
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704


[Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression

2004-12-06 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2004-12-06 14:31 ---
Subject: Re:  [4.0 Regression] Inlining limits
 cause 340% performance regression

On 6 Dec 2004, hubicka at ucw dot cz wrote:

> > > the order of inlining decisions affecting this.  I would be curious how
> > > those results compare to leafify and whether the 0m27s is not caused by
> > > missoptimization.
> >
> > You can check for misoptimization by looking at the final output.
> > I.e. the rh,vx,vy and vz sums should be nearly zero, the T sum
> > will increase with the number of iterations.
> >
> > With mainline, -O2 -fpeel-loops -march=pentium4 -ffast-math
> > -D__NO_MATH_INLINES (we still need explicit -fpeel-loops for
> > unrolling for (i=0;i<3;++i) a[i]=0;), I need 0m17s for -n 10 with
> > leafification turned on, with it turned off, runtime increases
> > to 0m31s with --param inline-unit-growth=175.
>
> I compiled with -O3, would be possible for you to measure how much
> speedup you get on mainline with -O3 and -O3+lefify?  That would
> probably allow me relate those numbers somehow.

0m23s for -O3+leafify, 1m54s for -O3, 0m35s for -O3 --param
inline-unit-growth=150.

Richard.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704


[Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression

2004-12-06 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2004-12-06 13:18 ---
Subject: Re:  [4.0 Regression] Inlining limits
 cause 340% performance regression

On 6 Dec 2004, hubicka at ucw dot cz wrote:

> The cfg inliner per se is not too interesting.  What matters here is the
> code size esitmation and profitability estimation.  I am playing with
> this now and trying to get profile based inlining working.

Yes, I guess the cfg inliner and some early dead code removal passes
should improve code size metrics for stuff like

template 
struct Foo
{
  enum { val = X::val };
  void foo()
  {
if (val)
  ...
else
  ...
  }
};

with val being const.

> For -n10 and tramp3d.cc I need 2m14s on mainline, 1m31s on the current
> tree-profiling.  With my new implementation I need 0m27s with profile
> feedback and 2m53s without.  I wonder what makes the new heuristics work
> worse without profiling, but just increasing the inline-unit-growth very
> slightly (to 155) I get 0m42s.  This might be just little unstability in

Note that inline-unit-growth is 50 by default, so 155 is not slightly
increased.

> the order of inlining decisions affecting this.  I would be curious how
> those results compare to leafify and whether the 0m27s is not caused by
> missoptimization.

You can check for misoptimization by looking at the final output.
I.e. the rh,vx,vy and vz sums should be nearly zero, the T sum
will increase with the number of iterations.

With mainline, -O2 -fpeel-loops -march=pentium4 -ffast-math
-D__NO_MATH_INLINES (we still need explicit -fpeel-loops for
unrolling for (i=0;i<3;++i) a[i]=0;), I need 0m17s for -n 10 with
leafification turned on, with it turned off, runtime increases
to 0m31s with --param inline-unit-growth=175.

> Unless I will observe it otherwise (on SPEC with intermodule), I will
> apply my current patch and try to improve the profitability analysis
> without profiling incrementally.  Ideally we ought to build estimated
> profile and use it, but that needs some work so for the moment I guess I
> will try to experiment with making loop depth available to the cgraph
> code.

Yes, loops could be "auto-leafified", but it will be difficult to
statically check if that is worthwhile.

Richard.

--
Richard Guenther 
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704


[Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression

2004-12-06 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2004-12-06 12:33 ---
Subject: Re:  [4.0 Regression] Inlining limits
 cause 340% performance regression

On 6 Dec 2004, pinskia at gcc dot gnu dot org wrote:

> No reason to keep this one open, there is PR 17863 still.
> Also note I heard from Honza that the tree
> profiling branch with feedback can optimizate better than with your
> leafy patch.

I tried tree-profiling branch and profile-based inlining is actually
worse than "normal" inlining with inline-unit-growth=150.  Worse by
a factor of four.  So, no cigar yet.

And btw. profile based inlining seems to be ignorant of inline-unit-growth
(at least it doesnt improve for greater values).

And generating the profile is _very_ slow (for the tramp3d testcase).
Runtime increases about 100 fold - not very good for creating a meaningful
profile.

Richard.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704


[Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression

2004-12-06 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2004-12-06 09:53 ---
Subject: Re:  [4.0 Regression] Inlining limits
 cause 340% performance regression

On 6 Dec 2004, pinskia at gcc dot gnu dot org wrote:

> No reason to keep this one open, there is PR 17863 still.  Also note I heard 
> from Honza that the tree
> profiling branch with feedback can optimizate better than with your leafy 
> patch.

Wow, that would be cool.  Does the tree-profiling branch contain the
cfg inliner?  I'll try it asap.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704


[Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression

2004-11-29 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2004-11-29 12:10 ---
Documentation patches for 3.4 and mainline are here:

http://gcc.gnu.org/ml/gcc-patches/2004-11/msg02457.html
http://gcc.gnu.org/ml/gcc-patches/2004-11/msg02551.html

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704


[Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression

2004-11-29 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
dot de  2004-11-29 11:04 ---
Looking at the 3.4 branch the defaults for the relevant inlining parameters are
the same.  So the difference in performance has to be accounted to different
tree-node counting (or to differences in the accounting during inlining).

As we throttle inlining params if -Os is specified in opts.c:

  if (optimize_size)
{
  /* Inlining of very small functions usually reduces total size.  */
  set_param_value ("max-inline-insns-single", 5);
  set_param_value ("max-inline-insns-auto", 5);
  flag_inline_functions = 1;

may I suggest to throttle inline-unit-growth there, too (though it
shouldn't have an effect with so small max-inline-insns-single).  And
then provide the documented limit (150) for inline-unit-growth?

One may even argue that limiting overall unit growth is not important,
as it is already limited by max-inline-insns-* and large-function-*.
Also both inline-unit-growth and large-function-growth cause inlining
to stop at the threshold leaving one with an unbalanced inlining decision.

Why were these (growth) limits invented?  Were there some particular testcases
that broke down otherwise?

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704


[Bug tree-optimization/18704] New: Inlining limits cause 340% performance regression

2004-11-28 Thread rguenth at tat dot physik dot uni-tuebingen dot de
Compared to 3.4, the default inlining limits in 4.0 cause a 340%
performance regression on the tramp3d-v3.cpp testcase here:
http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/tramp3d-v3.cpp.gz

The regression can be attributed to the inlining limits, as
patching both compilers with the leafify patch results in same
performance.

Compilation options used are -Dleafify=fooblah -O2 -fpeel-loops -ffast-math
-march=pentium4 -mfpmath=sse -fno-exceptions.  Binary size is
"improved" by about 9% with the current defaults.

Using --param max-inline-insns-single=1000 worsens the situation to
a

Playing with the inlining params gives

max-inline-insns-single  large-function-growth  inline-unit-growth  regression
  340%
   1000   375%
   500348%
  200 -36% (1%
size regression)
  175 -35% (4%
size improvement)
  165 -12%
  150 -12% (!?)
  100 232%

So I guess, limiting overall unit growth is bad - can we disable limiting at
-Os, or provide a higher default value?  The "correct" value will be different
depending on the application.  Also, the documented default value for
inline-unit-growth is not what it actually seems to be (it is 50 reading
params.def, large-function-growth is also not correctly documented).

If we make the documented values the default, we get a 68% compile time
and a 3.7% code size regression for a 71% performance improvement (this was
including "correcting" the large-function-growth limit, which seems to hurt
rather than help).

-- 
   Summary: Inlining limits cause 340% performance regression
   Product: gcc
   Version: 4.0.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: rguenth at tat dot physik dot uni-tuebingen dot de
CC: gcc-bugs at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704


[Bug c++/18296] Misleading diagnostic for recursive template instantiation

2004-11-04 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de  
2004-11-04 14:30 ---
Subject: Re:  Misleading diagnostic for recursive template
 instantiation

On 4 Nov 2004, pinskia at gcc dot gnu dot org wrote:

> Confirmed, I think PR 15538 would fix the problem because the class is an incomplete 
> type at this
> point.

Yes, maybe - though icpc (7.1 and 8.0) in this case isn't helpful, too:


tests> icpc -c notype.cpp
notype.cpp(29): error: class "ComponentView" has no member
"Type_t"
typename ComponentView::Type_t
^
  detected during:
instantiation of class "Array [with Dim=1,
T=double, EngineTag=Brick]" at line 19
instantiation of class "ComponentView> [with Dim=1, T=double, EngineTag=Brick]" at line 36

compilation aborted for notype.cpp (code 2)


suspiciously similar to gcc.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18296


[Bug c++/18296] New: Misleading diagnostic for recursive template instantiation

2004-11-04 Thread rguenth at tat dot physik dot uni-tuebingen dot de
template 
struct CompFwd;

struct Brick;
  
template 
struct Engine;
  
template 
class Array;

template  
struct ComponentView;

template
struct ComponentView >
{
  typedef Array Subject_t;
  typedef typename Subject_t::Engine_t Engine_t;
  typedef Array > Type_t;
};

template 
struct Array
{
  typedef Engine Engine_t;
  typedef Array This_t;

  typename ComponentView::Type_t
  comp(int i1) const;
};

typedef Array<1, double, Brick> Array_t;
typedef ComponentView::Type_t CView_t;


causes g++ to emit:
tests> g++-3.4 -c notype.cpp  
notype.cpp: In instantiation of `Array<1, double, Brick>':
notype.cpp:19:   instantiated from `ComponentView'
notype.cpp:36:   instantiated from here
notype.cpp:30: error: no type named `Type_t' in `struct ComponentView'

which could be improved to mention the missing of the type is caused by aborted
recursive instantiation of struct ComponentView.  At the moment the
diagnostic is at least misleading, as there is a Type_t in struct
ComponentView.

-- 
   Summary: Misleading diagnostic for recursive template
instantiation
   Product: gcc
   Version: 3.4.3
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P2
 Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: rguenth at tat dot physik dot uni-tuebingen dot de
CC: gcc-bugs at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18296


[Bug tree-optimization/13776] [4.0 Regression] [tree-ssa] Many C++ compile-time regression in 4.0-tree-ssa 040120

2004-10-25 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de  
2004-10-25 13:02 ---
Subject: Re:  [4.0 Regression] [tree-ssa] Many
 C++ compile-time regression in 4.0-tree-ssa 040120

And
http://gcc.gnu.org/ml/gcc/2004-10/msg00955.html



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13776


[Bug c/18042] [4.0 regression] does not handle struct initializer

2004-10-17 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de  
2004-10-17 21:51 ---
Created an attachment (id=7369)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=7369&action=view)
testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18042


[Bug c/18042] New: [4.0 regression] does not handle struct initializer

2004-10-17 Thread rguenth at tat dot physik dot uni-tuebingen dot de
The testcase is rejected with

> gcc -c const.c
const.c:25: error: initializer element is not constant

the testcase is fine with any previous version of gcc.

This is mainline from 20041017, a version from about two month ago was ok.

-- 
   Summary: [4.0 regression] does not handle struct initializer
   Product: gcc
   Version: 4.0.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: rguenth at tat dot physik dot uni-tuebingen dot de
CC: gcc-bugs at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18042


[Bug c++/10479] alignof and sizeof (and other expressions) in attributes does not compile inside template classes

2004-10-16 Thread rguenth at tat dot physik dot uni-tuebingen dot de

--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de  
2004-10-16 15:42 ---
Subject: Re:  alignof and sizeof (and other expressions) in
 attributes does not compile inside template classes

giovannibajo at libero dot it wrote:
> --- Additional Comments From giovannibajo at libero dot it  2004-10-16 11:06 
> ---
> Fixed in GCC 4.0.0. Thanks for your report!

Can this be trivially backported to 3.4?  That would be cool.

Thanks,
Richard.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10479