[Bug tree-optimization/23049] [4.1 Regression] ICE with -O3 -ftree-vectorize on 4.1.x
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-09-17 19:31 --- Please fix the caller who is not folding the condition in the first place instead. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23049
[Bug tree-optimization/23928] Exceptions require an excessive amount of compile-time memory
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-09-17 18:43 --- Extra ggc_collect after each optimize_inline_calls does not help reduce it further. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23928
[Bug tree-optimization/23928] Exceptions require an excessive amount of compile-time memory
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-09-17 18:00 --- eh-complexity patch from http://gcc.gnu.org/ml/gcc-patches/2005-06/msg01052.html slightly edited to apply (and approved by GeoffK in june) helps: peak memory usage is down to 1.2GB. garbage collection: 17.32 ( 6%) usr 0.84 (11%) sys 18.19 ( 6%) wall 0 kB ( 0%) ggc integration : 29.85 (10%) usr 0.86 (11%) sys 30.90 (10%) wall 2695445 kB (234%) ggc tree PTA : 15.98 ( 6%) usr 0.23 ( 3%) sys 15.50 ( 5%) wall 59710 kB ( 5%) ggc tree alias analysis : 11.35 ( 4%) usr 0.51 ( 7%) sys 12.34 ( 4%) wall 95003 kB ( 8%) ggc tree PHI insertion: 2.19 ( 1%) usr 0.02 ( 0%) sys 2.26 ( 1%) wall 35414 kB ( 3%) ggc tree SSA rewrite : 12.47 ( 4%) usr 0.11 ( 1%) sys 12.80 ( 4%) wall 203797 kB (18%) ggc tree SSA other: 1.81 ( 1%) usr 0.26 ( 3%) sys 2.03 ( 1%) wall 2499 kB ( 0%) ggc tree SSA incremental : 24.40 ( 8%) usr 0.10 ( 1%) sys 24.65 ( 8%) wall 64150 kB ( 6%) ggc tree operand scan : 9.95 ( 3%) usr 1.00 (13%) sys 11.09 ( 4%) wall 116251 kB (10%) ggc dominator optimization: 11.43 ( 4%) usr 0.07 ( 1%) sys 11.36 ( 4%) wall 168489 kB (15%) ggc TOTAL : 288.38 7.62 297.15 1154283 kB -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23928
[Bug tree-optimization/23928] Exceptions require an excessive amount of compile-time memory
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-09-17 17:07 --- ipa-eh patch from http://gcc.gnu.org/ml/gcc-patches/2005-09/msg00881.html (with fix) does not really help. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23928
[Bug tree-optimization/23928] New: Exceptions require an excessive amount of compile-time memory
The tramp3d-v4.cpp testcase with flatten (aka leafify) enabled requires excessive amount of memory for the compile, if exceptions are not disabled via -fno-exceptions. Compiling with -O2 -Dleafify=flatten -fno-exceptions mainline needs at max. 670MB of ram, while omitting -fno-exceptions it tops out at 2.7GB(!). Testing was done on x86_64 with 8GB ram to avoid hitting swap. ggc params are --param ggc-min-expand=100 --param ggc-min-heapsize=131072. The tramp3d-v4.cpp testcase is available from http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/tramp3d-v4.cpp.gz -ftime-report from the -fexceptions run shows Execution times (seconds) garbage collection: 19.16 ( 4%) usr 1.10 (11%) sys 20.33 ( 4%) wall 0 kB ( 0%) ggc ... integration : 188.01 (41%) usr 2.53 (26%) sys 191.29 (40%) wall 842654 kB (24%) ggc ... tree CFG cleanup : 10.57 ( 2%) usr 0.05 ( 1%) sys 10.69 ( 2%) wall 33061 kB ( 1%) ggc tree VRP : 5.18 ( 1%) usr 0.14 ( 1%) sys 5.14 ( 1%) wall 40349 kB ( 1%) ggc tree copy propagation : 5.46 ( 1%) usr 0.09 ( 1%) sys 5.56 ( 1%) wall 5073 kB ( 0%) ggc tree store copy prop : 1.10 ( 0%) usr 0.02 ( 0%) sys 0.97 ( 0%) wall 1015 kB ( 0%) ggc tree find ref. vars : 3.96 ( 1%) usr 0.05 ( 1%) sys 4.06 ( 1%) wall 150561 kB ( 4%) ggc tree PTA : 17.47 ( 4%) usr 0.29 ( 3%) sys 17.45 ( 4%) wall 59716 kB ( 2%) ggc tree alias analysis : 12.44 ( 3%) usr 0.61 ( 6%) sys 12.84 ( 3%) wall 95403 kB ( 3%) ggc tree PHI insertion: 2.25 ( 0%) usr 0.02 ( 0%) sys 2.49 ( 1%) wall 35414 kB ( 1%) ggc tree SSA rewrite : 11.87 ( 3%) usr 0.04 ( 0%) sys 11.91 ( 3%) wall 203499 kB ( 6%) ggc tree SSA other: 2.02 ( 0%) usr 0.22 ( 2%) sys 2.46 ( 1%) wall 2499 kB ( 0%) ggc tree SSA incremental : 25.40 ( 6%) usr 0.18 ( 2%) sys 26.07 ( 6%) wall 63750 kB ( 2%) ggc tree operand scan : 10.79 ( 2%) usr 1.18 (12%) sys 12.01 ( 3%) wall 116147 kB ( 3%) ggc dominator optimization: 11.64 ( 3%) usr 0.08 ( 1%) sys 12.08 ( 3%) wall 168798 kB ( 5%) ggc ... expand: 15.71 ( 3%) usr 0.07 ( 1%) sys 15.54 ( 3%) wall 194871 kB ( 6%) ggc ... TOTAL : 461.33 9.78 473.07 3503243 kB -- Summary: Exceptions require an excessive amount of compile-time memory Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rguenth at tat dot physik dot uni-tuebingen dot de CC: gcc-bugs at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23928
[Bug middle-end/23925] HDF5 check fails--type conversions
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-09-17 14:17 --- Please provide -fno-strict-aliasing with the build CFLAGS. I bugged the Debian people to do this once, and this fixed all such issues. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23925
[Bug c++/23372] [4.0/4.1 Regression] Temporary aggregate copy not elided when passing parameters by value
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-08-13 18:16 --- Indeed - adding a destructor (or anything else that makes it a non-POD) "fixes" the problem, too. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23372
[Bug c++/23372] [4.0/4.1 Regression] Temporary aggregate copy not elided when passing parameters by value
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-08-13 18:11 --- With the copy ctor we end up with void g(A*) (a) { struct A D.1603; : __comp_ctor (&D.1603, a); f (&D.1603); return; } which confuses me a bit, because here the prototype of f looks like effectively void f(A*); do we use ABI information here, but not in the other case? The C++ frontend in this case presents us with { < D.1603 >>> >) >>> >>; } where in the case w/o the copy ctor we have <>) >>> >>; is there some different wording about by-value parameter passing with or without explicit copy ctor in the C++ standard?! I.e., why isn't the above <>) >>> >>; ? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23372
[Bug c++/23372] Temporary aggregate copy not elided when passing parameters by value
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-08-13 14:17 --- The problem is, we end up with void g(A*) (a) { struct A D.1608; : D.1608 = *a; f (D.1608) [tail call]; return; } after the tree optimizers. f (*a) would not be gimple, so we create the temporary in the first place. TER does not remove this wart, neither does expand - so we start with two memcpys after RTL expansion. This is definitively different from PR16405. -- What|Removed |Added CC||rguenth at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23372
[Bug tree-optimization/22548] Aliasing can not tell array members apart
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-08-12 13:02 --- Subject: Re: Aliasing can not tell array members apart On 12 Aug 2005, giovannibajo at libero dot it wrote: > Can you document what's the compile-time effect of raising salias-max-array- > elements? For instance, how much do we lose in bootstrap+tramp3d if we raise > it > to 16 or even 1024? I'll do so once I return from holidays. Richard. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22548
[Bug tree-optimization/23326] [4.0 Regression] Wrong code from forwprop
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-08-11 17:43 --- I'll do that. Though + /* If we don't have , then we cannot +optimize this case. */ + if ((cond_code == NE_EXPR || cond_code == EQ_EXPR) + && TREE_CODE (TREE_OPERAND (cond, 1)) != INTEGER_CST) + continue; should probably read + /* If we don't have , then we cannot +optimize this case. */ + if (!((cond_code == NE_EXPR || cond_code == EQ_EXPR) + && TREE_CODE (TREE_OPERAND (cond, 1)) == INTEGER_CST)) + continue; because else we might get f.i. LE_EXPR passing through? Maybe the little context confuses me here, though. I'll have a look before testing. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23326
[Bug c++/21619] [4.0/4.1 regression] __builtin_constant_p(&"Hello"[0])?1:-1 not compile-time constant
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-06-01 08:16 --- Subject: Re: [4.0/4.1 regression] __builtin_constant_p(&"Hello"[0])?1:-1 not compile-time constant On 1 Jun 2005, pinskia at gcc dot gnu dot org wrote: > > --- Additional Comments From pinskia at gcc dot gnu dot org 2005-06-01 > 00:31 --- > : Search converges between 2004-08-30-trunk (#529) and 2004-08-31-trunk > (#530). Top of cp/ChangeLog for these? I point my finger at 2004-08-31 Richard Henderson <[EMAIL PROTECTED]> PR c++/17221 * pt.c (tsubst_expr): Move OFFSETOF_EXPR handling ... (tsubst_copy_and_build): ... here. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21619
[Bug tree-optimization/19626] Aliasing says stores to local memory do alias
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-04-07 12:50 --- Subject: Re: Aliasing says stores to local memory do alias On 7 Apr 2005, dberlin at dberlin dot org wrote: > > --- Additional Comments From dberlin at gcc dot gnu dot org 2005-04-07 > 12:48 --- > Subject: Re: Aliasing says stores to local > memory do alias > > > > Other than that, struct aliasing (or just removing the casts) doesn't fix > > the > > aliasing problems - though struct aliasing doesn't handle array elements at > > the moment(?). > > Correct, it does not. Ok, at least the RTL optimizers figure out that these stack locals cannot alias. Hope we get this for the tree optimizers, too. Richard. -- Richard Guenther WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19626
[Bug tree-optimization/17863] [4.0/4.1 Regression] threefold performance loss, not inlining as much
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-03-05 19:03 --- Subject: Re: [4.0/4.1 Regression] threefold performance loss, not inlining as much steven at gcc dot gnu dot org wrote: > --- Additional Comments From steven at gcc dot gnu dot org 2005-03-05 > 18:49 --- > Even with Richard Guenther's patches, the only thing that really helps is > setting --param large-function-growth=200, or more. The default is 100. Yup, this is probably one of the testcases, where -fobey-inline would help. Or of course profile directed inlining. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863
[Bug middle-end/19775] [3.4/4.0 regression] sqrt(pow(x,y)) != pow(x,y*0.5) (with -ffast-math)
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-02-07 13:25 --- Fixed. -- What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19775
[Bug tree-optimization/17863] [4.0 Regression] threefold performance loss, not inlining as much
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-02-03 17:32 --- Subject: Re: [4.0 Regression] threefold performance loss, not inlining as much bonzini at gcc dot gnu dot org wrote: > To the reporter: in this case you probably want __attribute__ ((leafify)), > just > in case, though you are right in expecting the compiler to inline it. But of course attribute leafify is not available without patching your gcc sources. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863
[Bug middle-end/19775] [3.3 regression] sqrt(pow(x,y)) != pow(x,y*0.5)
-- What|Removed |Added Severity|normal |critical Keywords||wrong-code Known to fail||3.4.4 4.0.0 Known to work||3.3.5 Priority|P2 |P1 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19775
[Bug middle-end/19775] New: [3.3 regression] sqrt(pow(x,y)) != pow(x,y*0.5)
This one should not abort: #include #include int main() { double x = -1.0; if (sqrt(pow(x,2)) != 1.0) abort(); return 0; } but both, 3.4.4 and 4.0.0 do sqrt(pow(x,y)) -> pow(x,y*0.5) which in this case means sqrt(1.0) -> -1.0. Ouch. -- Summary: [3.3 regression] sqrt(pow(x,y)) != pow(x,y*0.5) Product: gcc Version: 3.4.4 Status: UNCONFIRMED Severity: normal Priority: P2 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rguenth at tat dot physik dot uni-tuebingen dot de CC: gcc-bugs at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19775
[Bug tree-optimization/19639] Funny (horrible) code for empty destructor
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-30 18:54 --- Subject: Re: Funny (horrible) code for empty destructor pinskia at gcc dot gnu dot org wrote: > --- Additional Comments From pinskia at gcc dot gnu dot org 2005-01-29 > 21:19 --- > As(In reply to comment #7) > >>Or we could simply unroll the loop completely, but while SCEV finds >>the IV as > > > Again this is most likely because fold does not fold "&x.foo[2] - 4B" to > "&x.foo[0]", or someone forgets > to call fold on that. I know that fold_stmt can do it. Yeah, I can find code to fold &x.foo[i] - c * j, but not without the c * mult. I'll look into this later. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19639
[Bug tree-optimization/19639] Funny (horrible) code for empty destructor
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-29 21:14 --- Or we could simply unroll the loop completely, but while SCEV finds the IV as (set_scalar_evolution (scalar = this_6) (scalar_evolution = {(struct Foo * const) &x.foo[2] - 4B, +, -4B}_1)) ) it does not know about the number of iterations: (set_nb_iterations_in_loop = scev_not_known)) # BLOCK 1 # PRED: 3 [100.0%] (fallthru) 0 [100.0%] (fallthru,exec) Invalid sum of incoming frequencies 10258, should be 1 # thisD.1628_1 = PHI ; :; thisD.1628_6 = thisD.1628_1 - 4; if (thisD.1628_6 == &xD.1600.fooD.1587) goto ; else goto ; # SUCC: 2 [11.0%] (loop_exit,true,exec) 3 [89.0%] (dfs_back,false,exec) # BLOCK 3 # PRED: 1 [89.0%] (dfs_back,false,exec) :; goto (); # SUCC: 1 [100.0%] (fallthru) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19639
[Bug middle-end/19402] __builtin_powi? still missing
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-28 15:29 --- Looking into it. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19402
[Bug tree-optimization/17640] empty loop not removed after optimization
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-28 14:26 --- One patch for empty-loop removal was posted here by Zdenek http://gcc.gnu.org/ml/gcc-patches/2004-07/msg01679.html -- What|Removed |Added CC||rguenth at tat dot physik ||dot uni-tuebingen dot de http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17640
[Bug tree-optimization/19639] Funny (horrible) code for empty destructor
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-28 14:21 --- Folding &x.foo[2] == &x.foo to false does not help the testcase, as fold never sees this comparison. Instead the initial code the C++ frontend creates for ctor and dtor of arrays contains temporaries for these already. It seems the C++ frontend tries to be clever here, creating pointer IVs for the loop and doing too much manual optimizing. What other pass than fold() is supposed to handle this sort of simplification? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19639
[Bug tree-optimization/15791] fold misses that two ADDR_EXPR of an arrary obvious not equal
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-27 14:53 --- Bootstrapping and testing completed successfully, but for the testcase int g(void) { struct { int b[2]; } x; return &x.b[0] == &x.b[1]; } we have lowered the comparison to unit size align 32 symtab 0 alias set -1 precision 32 min max pointer_to_this > invariant arg 0 public unsigned SI size unit size align 32 symtab 0 alias set -1> invariant arg 0 invariant arg 0 arg 0 arg 1 >>> arg 1 invariant arg 0 invariant arg 0 invariant arg 0 arg 0 arg 1 >>> arg 1 >> and what confuses is the extra(?) nop_exprs - can I somehow avoid adding another path for this case? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15791
[Bug tree-optimization/15791] fold misses that two ADDR_EXPR of an arrary obvious not equal
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-26 18:03 --- Fails without the patch, too, with the same error. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15791
[Bug tree-optimization/15791] fold misses that two ADDR_EXPR of an arrary obvious not equal
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-26 17:24 --- Hmm, it seems it causes stage1/xgcc -Bstage1/ -B/usr/local/i686-pc-linux-gnu/bin/ -c -O2 -g -fomit-frame-pointer -DIN_GCC -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -pedantic -Wno-long-long -Wno-variadic-macros -Wold-style-definition -Werror -fno-common -DHAVE_CONFIG_H-I. -I. -I/home/rguenth/src/gcc/gcc4.0/gcc -I/home/rguenth/src/gcc/gcc4.0/gcc/. -I/home/rguenth/src/gcc/gcc4.0/gcc/../include -I/home/rguenth/src/gcc/gcc4.0/gcc/../libcpp/include /home/rguenth/src/gcc/gcc4.0/gcc/ggc-page.c -o ggc-page.o /home/rguenth/src/gcc/gcc4.0/gcc/ggc-page.c: In function 'ggc_pch_read': /home/rguenth/src/gcc/gcc4.0/gcc/ggc-page.c:2304: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html> for instructions. #0 0x081da08c in tsi_stmt (i={ptr = 0x0, container = 0x40798d50}) at /home/rguenth/src/gcc/gcc4.0/gcc/tree-iterator.h:93 #1 0x081da5a6 in bsi_stmt (i= {tsi = {ptr = 0x0, container = 0x40798d50}, bb = 0x401d4360}) at /home/rguenth/src/gcc/gcc4.0/gcc/tree-flow-inline.h:572 #2 0x081cb6b4 in stmt_after_ip_original_pos (cand=0x88104f8, stmt=0x40832a00) at /home/rguenth/src/gcc/gcc4.0/gcc/tree-ssa-loop-ivopts.c:613 #3 0x081cb751 in stmt_after_increment (loop=, cand=0x88104f8, stmt=0x40832a00) at /home/rguenth/src/gcc/gcc4.0/gcc/tree-ssa-loop-ivopts.c:635 no time to investigate - maybe an unrelated problem (didn't check if bootstrap succeeds without patch). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15791
[Bug tree-optimization/15791] fold misses that two ADDR_EXPR of an arrary obvious not equal
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-26 16:16 --- Umm, no. We fold the ARRAY_REF comparison to PLUS_EXPR(ADDR_EXPR, INTEGER_CST) == PLUS_EXPR(ADDR_EXPR, INTEGER_CST) oh well ;) So I guess transforming &a + i truth_op &a + j to i truth_op j is always correct, as &a - &a == 0. For &b[1] == b though, we'll have to do more checks for this. Patch attached, bootstrap and testing in progress. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15791
[Bug tree-optimization/15791] fold misses that two ADDR_EXPR of an arrary obvious not equal
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-26 15:30 --- Ok - I guess it's ARRAY_REFs that are not folded ;) So the summary could be "fold misses that two ARRAY_REFs with different offset of the same arrary are obviously not equal". But I'm not allowed to change that. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15791
[Bug tree-optimization/15791] fold misses that two ADDR_EXPR of an arrary obvious not equal
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-26 14:54 --- Subject: Re: fold misses that two ADDR_EXPR of an arrary obvious not equal On 26 Jan 2005, pinskia at gcc dot gnu dot org wrote: > (In reply to comment #5) > > Could we, in general, fold &a[i] TRUTHOP &a[j] to i TRUTHOP j? I guess the > > only special case would be for sizeof(a[i]) == 0 -- but that is not allowed > > by the standard? I'll be wading through fold tomorrow and look where to add > > this transformation. > sizeof(a[i]) can be zero for other languages besides C++ (C for an example). > I gave you an hint where this can be fixed by the coment :). Apart from this, the following should fix it (while bootstrapping I'll search for truthcode_p() and a way to test the type size): Index: fold-const.c === RCS file: /cvs/gcc/gcc/gcc/fold-const.c,v retrieving revision 1.497 diff -u -r1.497 fold-const.c --- fold-const.c23 Jan 2005 15:05:29 - 1.497 +++ fold-const.c26 Jan 2005 14:53:38 - @@ -8245,6 +8245,15 @@ ? code == EQ_EXPR : code != EQ_EXPR, type); + /* If this is a comparison of two ADDR_EXPRs of the same object + and the objects size is not zero, then we can fold this to +a comparison of the two offsets. */ + if ((code == EQ_EXPR || code == NE_EXPR /* FIXME: rest */) + && TREE_CODE (arg0) == ADDR_EXPR + && TREE_CODE (arg1) == ADDR_EXPR + && operand_equal_p (arg0, arg1, 0)) + return fold (build2 (code, type, TREE_OPERAND (arg0, 1), TREE_OPERAND (arg1, 0))); + if (FLOAT_TYPE_P (TREE_TYPE (arg0))) { tree targ0 = strip_float_extensions (arg0); -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15791
[Bug tree-optimization/15791] fold misses that two ADDR_EXPR of an arrary obvious not equal
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-26 14:35 --- Could we, in general, fold &a[i] TRUTHOP &a[j] to i TRUTHOP j? I guess the only special case would be for sizeof(a[i]) == 0 -- but that is not allowed by the standard? I'll be wading through fold tomorrow and look where to add this transformation. -- What|Removed |Added CC| |rguenth at tat dot physik | |dot uni-tuebingen dot de http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15791
[Bug tree-optimization/19639] Funny (horrible) code for empty destructor
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-26 14:10 --- We can also not fold &i[0] == &i[1] to false in int foo(void) { int i[2]; if (&i[0] == &i[1]) return 1; return 0; } or i+0 == i+1 which is transformed to &i[0] == &i[1]. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19639
[Bug tree-optimization/19639] New: Funny (horrible) code for empty destructor
The following simple testcase struct Foo { ~Foo() {} int i; }; struct NonPod { Foo foo[2]; }; void foo(void) { NonPod x; } produces(!) at -O2 _Z3foov: .LFB5: pushl %ebp .LCFI0: movl%esp, %ebp .LCFI1: subl$16, %esp .LCFI2: leal-2(%ebp), %edx movl%ebp, %eax .p2align 4,,15 .L4: decl%eax cmpl%edx, %eax jne .L4 leave ret yay! Looking at the optimized tree-dump, it contains a funny loop: void foo() () { struct Foo * const this; register struct Foo * D.1621; struct Foo[2] * D.1620; struct NonPod x; : if (&x.foo[2] == &x.foo) goto ; else goto ; :; this = &x.foo[2]; :; this = this - 1; if (this == &x.foo) goto ; else goto ; :; return; } which is roughly what is generated initially by the C++ frontend for the dtor: ;; Function NonPod::~NonPod() (_ZN6NonPodD1Ev *INTERNAL* ) ;; enabled by -tree-original { <<< Unknown tree: if_stmt 1 >>> ; try { } finally { { register struct Foo * D.1599; (if (&((struct NonPod *) this)->foo != 0B) { (void) (D.1599 = &((struct NonPod *) this)->foo + 2); while (1) { if (&((struct NonPod *) this)->foo == D.1599) break; (void) (D.1599 = D.1599 - 1);; __comp_dtor (NON_LVALUE_EXPR );; }; } else { 0 }); } } } :; Note the same happens for empty struct Foo, but even avoiding the ambiguous(?) &this->foo[2] - &this->foo[1] doesn't help. The RTL unroller, if enabled, gets rid of the most ugly stuff from above, but appearantly the tree loop optimizer does not know how to handle this loop. _Z3foov: .LFB5: pushl %ebp .LCFI0: movl%esp, %ebp .LCFI1: subl$16, %esp .LCFI2: movl%ebp, %esp popl%ebp ret -- Summary: Funny (horrible) code for empty destructor Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rguenth at tat dot physik dot uni-tuebingen dot de CC: gcc-bugs at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19639
[Bug tree-optimization/19637] New: Missed constant propagation with placement new
For the following testcase with three similar functions we do different tree optimizations: #include struct Foo { Foo() { i[0] = 1; } int i[2]; }; int foo_char(void) { int i[2]; new (reinterpret_cast(i)) Foo(); return reinterpret_cast(i)->i[0]; } int foo_void(void) { int i[2]; new (reinterpret_cast(i)) Foo(); return reinterpret_cast(i)->i[0]; } int foo_void_offset(void) { int i[2]; new (reinterpret_cast(&i[0])) Foo(); return reinterpret_cast(&i[0])->i[0]; } We only can optimize the foo_void_offset() variant to return 1, the foo_void() variant results in : this = (struct Foo *) &i[0]; this->i[0] = 1; i.6 = (struct Foo *) &i; return i.6->i[0]; where the difference starts in what the frontend produces: (void) (TARGET_EXPR ; and return = ((struct Foo *) &i[0])->i[0]; vs. (void) (TARGET_EXPR ; and return = ((struct Foo *) (int *) &i)->i[0]; note that mixing &i[0] and i does not allow folding. For the char* variant we even cannot prove that &i is non-null (!?): : i.2 = (char *) &i; __p = i.2; this = (struct Foo *) __p; if (__p != 0B) goto ; else goto ; :; this->i[0] = 1; :; i.4 = (struct Foo *) &i; return i.4->i[0]; though this might be somehow related to type-based aliasing rules(?). Note that the char variant does not care if &i[0] or plain i is specified. -- Summary: Missed constant propagation with placement new Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rguenth at tat dot physik dot uni-tuebingen dot de CC: gcc-bugs at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19637
[Bug middle-end/13776] [4.0 Regression] Many C++ compile-time regressions for MICO's ORB code
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-26 10:24 --- Subject: Re: [4.0 Regression] Many C++ compile-time regressions for MICO's ORB code > Bah, I hate profiles for "cc1plus -O2 ir.ii" without peaks: > > CPU: P4 / Xeon with 2 hyper-threads, speed 3194.17 MHz (estimated) > Counted GLOBAL_POWER_EVENTS events (time during which processor is not > stopped) with a unit mask of 0x01 (mandatory) count 10 > samples %symbol name > 25018 1.6858 walk_tree > 24322 1.6389 cgraph_node_for_asm > 19586 1.3198 htab_find_slot_with_hash Do you have numbers wether we are memory-bandwith limited here? If not, we might micro-optimize hash table access somewhat more. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13776
[Bug tree-optimization/19626] Aliasing says stores to local memory do alias
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-26 08:47 --- Subject: Re: Aliasing says stores to local memory do alias > D.2540 = (struct Loc<1> *) &dX.D.2210.D.2166.domain_m.buffer; > That confuses the aliasing mechanism > buffer is of type int* but you are casting it to Loc<1> *. Telling it the truth by having an array of Loc<1> instead doesn't help. I suppose you're talking about not decomposing Loc<2> into two Loc<1> as intermediate step? Well, yes, that's a design decision I cannot change. It looks superfluous for Loc<>, but makes sense for the more complex domain objects like Interval and Range (but that's a different story). But in principle a compiler could determine that the two objects cannot alias, even which this interwinded type structure? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19626
[Bug tree-optimization/19626] Aliasing says stores to local memory do alias
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-25 16:57 --- Created an attachment (id=8062) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=8062&action=view) testcase -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19626
[Bug tree-optimization/19626] New: Aliasing says stores to local memory do alias
Given the attached testcase, for reference, the interesting function is this: int loc_test(void) { const Loc<2> dX(1, 0); const Loc<2> k(0, 1); return k[0].first() + dX[0].first(); } aliasing tells us that the initializations of dX and k alias each other: : D.2540 = (struct Loc<1> *) &dX.D.2210.D.2166.domain_m.buffer; # dX_357 = V_MAY_DEF ; # k_358 = V_MAY_DEF ; *&(&D.2540->D.2094)->D.2057.domain_m = 1; # dX_365 = V_MAY_DEF ; # k_364 = V_MAY_DEF ; *&(&(D.2540 + 4B)->D.2094)->D.2057.domain_m = 0; D.2682 = (struct Loc<1> *) &k.D.2210.D.2166.domain_m.buffer; # dX_337 = V_MAY_DEF ; # k_338 = V_MAY_DEF ; *&(&D.2682->D.2094)->D.2057.domain_m = 0; # dX_361 = V_MAY_DEF ; # k_63 = V_MAY_DEF ; *&(&(D.2682 + 4B)->D.2094)->D.2057.domain_m = 1; D.2769 = (struct Loc<1> *) &k.D.2210.D.2166.domain_m.buffer; D.2791 = (struct Loc<1> *) &dX.D.2210.D.2166.domain_m.buffer; return (&D.2769->D.2094)->D.2057.domain_m + (&D.2791->D.2094)->D.2057.domain_m; which is of course (trivially) not true. This may be obfuscated by the actual implementation of the template class Loc (see attached complete testcase). At the RTL level we are able to optimize this to just return 1, as expected. This pessimizes tree loop optimizations if such constructs are used inside a loop and as induction variable. -- Summary: Aliasing says stores to local memory do alias Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rguenth at tat dot physik dot uni-tuebingen dot de CC: gcc-bugs at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19626
[Bug tree-optimization/19624] PRE pessimizes ivopts
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-25 15:27 --- I guess making PRE and ivopts playing nicely together perfectly is near to impossible - but any improvement in the 4.0 timeframe is welcome! -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19624
[Bug tree-optimization/19624] PRE pessimizes ivopts
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-25 14:52 --- Oh, in principle this should compile to roughly the same as void c_test(double *a, double *b, int ei, int ej, int stridea, int strideb) { for (int j=0; jhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=19624
[Bug tree-optimization/19624] PRE pessimizes ivopts
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-25 14:45 --- Created an attachment (id=8060) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=8060&action=view) testcase The testcase is reduced from a complex POOMA program. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19624
[Bug tree-optimization/19624] New: PRE pessimizes ivopts
The attached testcase is pessimized by PRE. Be sure to get tree-level complete loop unrolling enabled, f.i. with -O2 -funroll-loops with current mainline. With PRE, a lot less computations are hoisted out of the inner loop. Note this is not a regression to 3.4, which is not able to decompose Loc appropriately or avoid instantiating temporary objects of this type. -- Summary: PRE pessimizes ivopts Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rguenth at tat dot physik dot uni-tuebingen dot de CC: gcc-bugs at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19624
[Bug tree-optimization/19401] Trivial loop not unrolled
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-24 09:43 --- Another one - matrix multiplication: /* A [NxM], B [MxP] */ #define DOLOOP(N, M, P) \ void matmul ## N ## M ## P(double *res, const double *A, const double *B) \ { \ int i,j,k; \ for (k=0; khttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=19401
[Bug tree-optimization/19516] missed optimization (bool)
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-23 11:13 --- How comes, that if I change _Bool to int, after tree-optimizations we get foo (flag) { int D.1121; : D.1121_2 = *flag_1; if (D.1121_2 != 0) goto ; else goto ; :; bar (); D.1121_11 = *flag_1; if (D.1121_11 != 0) goto ; else goto ; :; bar () [tail call]; :; return; } If your analysis were correct, this shouldn't be possible, no? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19516
[Bug tree-optimization/18754] unrolling happens too late/SRA does not happen late enough
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-21 16:07 --- Experimenting with SRA inside loop together with cleanup passes after cunroll/sra didn't reveal anything good - even with loop cfg_cleanup patched in. See thread starting at http://gcc.gnu.org/ml/gcc-patches/2005-01/msg01315.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18754
[Bug tree-optimization/18754] unrolling happens too late/SRA does not happen late enough
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-20 15:15 --- Subject: Re: unrolling happens too late/SRA does not happen late enough On 20 Jan 2005, dberlin at dberlin dot org wrote: > Wiat, why are we running SRA twice again at all? > I can't figure this out from the bug report, other than seeing that we > "could sra c.array", but i don't see why that requires a loop opt first. We don't run sra twice. But an early loop unrolling will change f.i. for (unsigned int d=0; d<4; ++d) c.array[d] = a.array[d] * b.array[d]; to c.array[0] = a.array[0] * b.array[0]; c.array[1] = a.array[1] * b.array[1]; c.array[2] = a.array[2] * b.array[2]; c.array[3] = a.array[3] * b.array[3]; and SRA can only scalarize this variant, not if the loop is still there. That's the whole point of the loop<->sra ordering problem. And of course sra may then expose new interesting choices for iv's of outer loops - at least I think. Richard. -- Richard Guenther WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18754
[Bug tree-optimization/18754] unrolling happens too late/SRA does not happen late enough
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-20 14:57 --- Subject: Re: unrolling happens too late/SRA does not happen late enough > Note PR 18755 blocks this if we go the SRA after loop optimization which > seems like a better idea. I do not completely understand this sentence ;) I argue that SRA after loop is a bad idea, because SRA, in my testcases, will expose new oportunities for selecting ivs, so we'll need to run another loop after SRA. So I chose for loop0 sra loop instead of sra loop sra loop which is one pass less. Also with -ftree-early-loop-optimize we get in .vars for PR18755: ;; Function float foobar() (_Z6foobarv) float foobar() () { : return a.array[3] * b.array[3] + b.array[2] * a.array[2] + b.array[1] * a.array[1] + a.array[0] * b.array[0] + 0.0; } which is what we want? Or do we now just paper over another problem here? I'm confused... Richard. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18754
[Bug tree-optimization/18754] unrolling happens too late/SRA does not happen late enough
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-20 10:57 --- This is also somewhat related to PR19401 as we do not unroll loops completely with just -O2 at the moment, which is important for the second testcase. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18754
[Bug tree-optimization/19507] missed tree-optimization
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-18 22:29 --- Done. PR19516. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19507
[Bug tree-optimization/19516] New: missed optimization
Actually a side-bug of 19507. The testcase void bar(void); void foo(const _Bool *flag) { if (*flag) bar(); if (*flag) bar(); } Should be transformed to (at the tree level): if (!*flag) return; bar(); if (*flag) bar(); this is only done at the RTL level at the moment. Andrew Pinski reports this works, if we exchange _Bool for int/short/char. -- Summary: missed optimization Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rguenth at tat dot physik dot uni-tuebingen dot de CC: gcc-bugs at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19516
[Bug tree-optimization/19507] missed tree-optimization
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-18 20:10 --- Subject: Re: missed tree-optimization pinskia at gcc dot gnu dot org wrote: > --- Additional Comments From pinskia at gcc dot gnu dot org 2005-01-18 > 20:06 --- > (In reply to comment #1) > >>A C testcase with the missing jump threading(?): >> >>void bar(void); >> >>void foo(const _Bool *flag) >>{ >>if (*flag) >>bar(); >>if (*flag) >>bar(); >>} > > > No this one cannot be optimizated because we can change what is in flag in > bar(); I meant this should be transformed to if (!*flag) return; bar(); if (*flag) bar(); this is done at RTL level, but not at tree level. I should file a separate bug for this one, really. Richard. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19507
[Bug tree-optimization/19507] missed tree-optimization
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-18 16:39 --- A C testcase with the missing jump threading(?): void bar(void); void foo(const _Bool *flag) { if (*flag) bar(); if (*flag) bar(); } a testcase where we able to thread the jump: extern long int random(void); void foo(void) { long int i = random(); if (i) i = random(); if (i) i = random(); } the difference seems to be we use .GLOBAL_VAR_10 = V_MAY_DEF <.GLOBAL_VAR_9>; in the latter while we use TMT.0_9 = V_MAY_DEF ; in the former. Though, of course, I don't know what either means. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19507
[Bug tree-optimization/19507] New: missed tree-optimization
The following testcase: class Flag { public: Flag(bool f) : flag(f) {} bool test() const { return flag; } private: const bool flag; }; void bar(void); void foo(const Flag& f) { if (f.test()) bar(); if (f.test()) bar(); } Should from my point of view should generate exactly one test and optimize the redundant one. I miss what could be a not ill-formed way of bar() modifying Flag::flag. With mainline -O2 -S -fdump-tree-optimized-vops we get for t63.optimized: : if (f->flag != 0) goto ; else goto ; :; # TMT.2_17 = V_MAY_DEF ; bar (); :; if (f->flag != 0) goto ; else goto ; :; # TMT.2_16 = V_MAY_DEF ; bar () [tail call]; :; return; The RTL optimizers exploit a valid optimization, namely: _Z3fooRK4Flag: .LFB6: pushl %ebx# .LCFI0: subl$8, %esp#, .LCFI1: movl16(%esp), %ebx # f, f cmpb$0, (%ebx) # .flag jne .L8 #, .L6: addl$8, %esp#, popl%ebx# ret .p2align 4,,7 .L8: call_Z3barv # cmpb$0, (%ebx) # .flag .p2align 4,,4 je .L6 #, addl$8, %esp#, popl%ebx# jmp _Z3barv # where you can see we optimized the function into the equivalent of if (!f.test()) return; bar(); if (!f.test()) return; bar(); Who is supposed to apply the corresponding tree optimization here? Of course, I think it is valid to omit the second test completely as there is no valid way for bar() to change Flag::flag. Note that this may be a frontend issue, as to the tree-optimizers this may be no different than void foo(const bool& f) { if (f) bar(); if (f) bar(); } where there of course are valid ways for bar() to change f. -- Summary: missed tree-optimization Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rguenth at tat dot physik dot uni-tuebingen dot de CC: gcc-bugs at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19507
[Bug middle-end/19402] __builtin_powi? still missing
-- What|Removed |Added CC||rguenth at tat dot physik ||dot uni-tuebingen dot de http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19402
[Bug tree-optimization/19401] Trivial loop not unrolled
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-12 16:24 --- Or stuff often found in C++ libraries: template struct Vector { Vector(float init) { for (int i=0; ihttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=19401
[Bug tree-optimization/19401] Trivial loop not unrolled
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-12 16:19 --- In 3.4 one was able to do this by specifying -fpeel-loops and got complete loop peeling enabled. In 4.0 this is also the case, but only for the RTL unroller - the tree unroller is not affected and as such _this_ unrolling does not help PR11706. - Just another datapoint. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19401
[Bug libstdc++/11706] std::pow(T, int) implementation pessimizes code
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-12 16:17 --- Current status is that with -O2 on mainline we generate the same (better) code for ::pow(x, 2) and std::pow(x, 2.0) than for std::pow(x, 2) which looses because of the lack of unrolling (PR19401). Also, ::pow(x, 27) and other exponents will always generate better code than the std::pow(x, 27) variant due to the technically superior implementation of gcc/builtins.c:expand_powi. The attached patch solves all of these problems, unfortunately in ways the libstdc++ maintainer(s) do not like. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11706
[Bug tree-optimization/19401] Trivial loop not unrolled
-- What|Removed |Added OtherBugsDependingO||11706 nThis|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19401
[Bug tree-optimization/19401] New: Trivial loop not unrolled
We do not unroll the loop in double foo(double __x) { unsigned int __n = 2; double __y = __n % 2 ? __x : 1; while (__n >>= 1) { __x = __x * __x; if (__n % 2) __y = __y * __x; } return __y; } with -O2 which causes us to emit gratiously worse code for std::pow(x, 2) than for std::pow(x, 2.0). We should definitely get this right without -funroll-loops and all its side-effects. -- Summary: Trivial loop not unrolled Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rguenth at tat dot physik dot uni-tuebingen dot de CC: gcc-bugs at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19401
[Bug rtl-optimization/11707] [3.4 Regression] [new unroller] constants not propagated in unrolled loop iterations with a conditional
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-12 11:05 --- I can re-confirm that the patch moves 3.4 to the state of 3.3 - i.e. with an extra imull compared to 2.95 and 4.0. The patch has bootstrapped with checking enabled and -funroll-loops on ia64, testing is in process. I'll formally submit the patch shortly. For the imull regression I'll file a separate bug with a possibly reduced testcase. -- What|Removed |Added CC||rakdver at gcc dot gnu dot ||org Known to fail|3.4.0 |3.4.0 3.4.3 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11707
[Bug c++/10611] operations on vector mode not recognized in C++
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-12 09:44 --- What is the status on this issue? I.e. +,-,*,/ on vector types for C++? Note that trying to work around this missing feature with operator overloading like v4sf operator+(const v4sf& a, const v4sf& b) { return __builtin_ia32_addps(a, b); } (which would be again machine specific, but anyhow) doesn't work: t.c:3: error: 'float __vector__ operator+(const float __vector__&, const float __vector__&)' must have an argument of class or enumerated type. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10611
[Bug rtl-optimization/13246] [new-ra][meta-bug] new-ra related problems
-- Bug 13246 depends on bug 10469, which changed state. Bug 10469 Summary: constant V4SF loads get moved inside loop http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10469 What|Old Value |New Value Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13246
[Bug rtl-optimization/10469] constant V4SF loads get moved inside loop
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2005-01-12 09:35 --- I guess we won't ever fix this for 3.3 and new-ra is dead, so this is "fixed". -- What|Removed |Added Status|ASSIGNED|RESOLVED Known to work|3.4.1 |3.4.1 3.4.3 4.0.0 Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10469
[Bug target/19131] alloca returning unnecessarily aligned pointer and uses too much memory
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2004-12-23 22:23 --- Subject: Re: alloca returning unnecessarily aligned pointer and uses too much memory pinskia at gcc dot gnu dot org wrote: > --- Additional Comments From pinskia at gcc dot gnu dot org 2004-12-22 > 15:06 --- > The reason you cannot find anything in the C standard is because this is ABI > thing so this is invalid Where is the ABI specified? Is it the "System V ABI, Intel386 Architecture Processor Supplement" document I found at http://www.caldera.com/developers/devspecs/abi386-4.pdf? This one talks about word-alignment of the stack, not 16 byte alignment. > We need to keep the stack aligned sorry. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19131
[Bug target/19131] alloca returning unnecessarily aligned pointer and uses too much memory
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2004-12-22 18:16 --- Subject: Re: alloca returning unnecessarily aligned pointer and uses too much memory pinskia at gcc dot gnu dot org wrote: > --- Additional Comments From pinskia at gcc dot gnu dot org 2004-12-22 > 15:06 --- > The reason you cannot find anything in the C standard is because this is ABI > thing so this is invalid > > We need to keep the stack aligned sorry. Inside a function!? Or just at function callsites? Humm, the Intel compiler produces ..B1.3: # Preds ..B1.2 ..B1.4 movl $4, %eax #5.12 subl %eax, %esp#5.12 andl $-16, %esp#5.12 movl %esp, %eax#5.12 # LOE eax ebx ebp esi edi ..B1.4: # Preds ..B1.3 addl (%eax), %ebx #6.3 addl $1, %esi #4.21 cmpl %edi, %esi#4.2 jl..B1.3 which looks like it aligns the stack after alloca, but it manages to waste less space by subtracting $4, not $32. Also if the ABI says the stack is aligned, why do we not make use of this and avoid the andl $-16, %esp -- or is the alignment only about alloca? I'm a bit confused. Richard. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19131
[Bug tree-optimization/19131] New: alloca returning unnecessarily aligned pointer and uses too much memory
The testcase int foo(int bar) { int i, res = 0; for (i=0; ihttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=19131
[Bug tree-optimization/18754] unrolling happens too late/SRA does not happen late enough
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2004-12-16 17:08 --- The attached patch makes us for -O3 -funroll-loops -ffast-math produce in .vars float foobar() () { : return a.array[3] * b.array[3] + a.array[2] * b.array[2] + a.array[0] * b.array[0] + a.array[1] * b.array[1]; } though the assembly is as good as before. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18754
[Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2004-12-07 15:35 --- Subject: Re: [4.0 Regression] Inlining limits cause 340% performance regression On Tue, 7 Dec 2004, Richard Guenther wrote: > static inline void foo() {} > void bar() { foo(); } > > which for -O2 -fprofile-generate produces > > bar: > addl$1, .LPBX1 > pushl %ebp > movl%esp, %ebp > adcl$0, .LPBX1+4 > addl$1, .LPBX1+16 > popl%ebp > adcl$0, .LPBX1+20 > addl$1, .LPBX1+8 > adcl$0, .LPBX1+12 > ret Mainline manages to produce bar: addl$1, .LPBX1 pushl %ebp movl%esp, %ebp adcl$0, .LPBX1+4 popl%ebp ret but that's RTL instrumentation? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704
[Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2004-12-07 15:09 --- Subject: Re: [4.0 Regression] Inlining limits cause 340% performance regression On 7 Dec 2004, hubicka at ucw dot cz wrote: > > Yes, it seems so. Really nice improvement. Though profiling is > > sloow. I guess you avoid doing any CFG changing transformation > > for the profiling stage? I.e. not even inline the simplest functions? > > I can inline but only after actually instrumenting the functios. That > should minimize the costs, but I also noticed that tramp3d is > surprisingly a lot slower with profiling. > > > That would be the reason the Intel compiler is unusable with profiling > > for me. -fprofile-generate comes with a 50fold increase in runtime! > > -fprofile-generate is actually package of > -fprofile-arcs/-fprofile-values + -fprofile-values-transformations > It might be interesting to figure out whether -fprofile-arcs itslef > brings similar slowdown. Only reason why this can happen I can think of > is the fact that after instrumenting we again inline a lot less or we > produce too many redundant counter. Perhaps it would make sense to > think about inlining functions reducing code size before instrumenting > as we would do that anyway, but it will be tricky to get gcov output and > -f* flags independence right then. Hm. There are a lot of counters - maybe it is possible to merge the counters themselves? The resulting asm of tramp3d-v3 consists of 30% addl/adcl lines for adding the profiling counts - where the total number of lines is just wc -l of a -S -fverbose-asm compilation. That's very much a lot. And additions are in cache unfriedly sequence, too - dunno which optimization pass could improve this though. Consider static inline void foo() {} void bar() { foo(); } which for -O2 -fprofile-generate produces bar: addl$1, .LPBX1 pushl %ebp movl%esp, %ebp adcl$0, .LPBX1+4 addl$1, .LPBX1+16 popl%ebp adcl$0, .LPBX1+20 addl$1, .LPBX1+8 adcl$0, .LPBX1+12 ret that should be bar: addl$1, .LPBX1 pushl %ebp movl%esp, %ebp adcl$0, .LPBX1+4 addl$1, .LPBX1+8 adcl$0, .LPBX1+12 addl$1, .LPBX1+16 adcl$0, .LPBX1+20 ret And of course all the three counters could be merged. But that would need a changed gcov file format somehow representing a callgraph with merged edges. The intel compiler is so much worse here because all the counter adding is done thread-safe in a library (i.e. they have an extra call for every edge and do not do any inlining). > How our profilng performance is compared to ICC? ICC is a lot worse. ICC with -prof_gen causes a 1 fold slowdown (if the current snapshot of icc doesn't segfault compiling the tramp3d testcase) - ICC is completely unusable for me. So - GCC is great! > > > It would be nice to experiment with this a little - in general the > > > heuristics can be viewed as having three players. There are the limits > > > (specified via --param) that it must obey, there is the cost model > > > (estimated growth for inlining into all callees without profiling and > > > the execute_count to estimated growth for inlining to one call with > > > profiling) and the bin packing algorithm optimizing the gains while > > > obeying the limits. > > > > > > With profiling in the cost model is pretty much realistic and it would > > > be nice to figure out how the performance behave when the individual > > > limits are changed and why. If you have some time for experimentation, > > > it would be very usefull. I am trying to do the same with SPEC and GCC > > > but I have dificulty to play with pooma or Gerald's application as I > > > have little understanding what is going there. I will try it myself > > > next but any feedback can be very usefull here. > > > > I can produce some numbers for the tramp testcase. > Thanks! Note that with changling the flags you should not need to > re-profile now so you can save quite a lot of time. Ah, thats indeed nice. Richard. -- Richard Guenther WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704
[Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2004-12-07 14:35 --- Subject: Re: [4.0 Regression] Inlining limits cause 340% performance regression On 6 Dec 2004, hubicka at ucw dot cz wrote: > Looks like I get 4fold speedup on tree profiling with profiling compared > to tree profiling on mainline that is equivalent to speedup you are > seeing for leafify patch. That sounds pretty prommising (so the new > heuristics can get the leafify idea without the hint from user and > hitting the code growth problems). Yes, it seems so. Really nice improvement. Though profiling is sloow. I guess you avoid doing any CFG changing transformation for the profiling stage? I.e. not even inline the simplest functions? That would be the reason the Intel compiler is unusable with profiling for me. -fprofile-generate comes with a 50fold increase in runtime! > It would be nice to experiment with this a little - in general the > heuristics can be viewed as having three players. There are the limits > (specified via --param) that it must obey, there is the cost model > (estimated growth for inlining into all callees without profiling and > the execute_count to estimated growth for inlining to one call with > profiling) and the bin packing algorithm optimizing the gains while > obeying the limits. > > With profiling in the cost model is pretty much realistic and it would > be nice to figure out how the performance behave when the individual > limits are changed and why. If you have some time for experimentation, > it would be very usefull. I am trying to do the same with SPEC and GCC > but I have dificulty to play with pooma or Gerald's application as I > have little understanding what is going there. I will try it myself > next but any feedback can be very usefull here. I can produce some numbers for the tramp testcase. > My plan is to try undersand the limits first and then try to get the > cost model better without profiling as it is bit too clumpsy to do both > at once. Do you have some written overview of the cost model? Richard. -- Richard Guenther WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704
[Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2004-12-06 14:31 --- Subject: Re: [4.0 Regression] Inlining limits cause 340% performance regression On 6 Dec 2004, hubicka at ucw dot cz wrote: > > > the order of inlining decisions affecting this. I would be curious how > > > those results compare to leafify and whether the 0m27s is not caused by > > > missoptimization. > > > > You can check for misoptimization by looking at the final output. > > I.e. the rh,vx,vy and vz sums should be nearly zero, the T sum > > will increase with the number of iterations. > > > > With mainline, -O2 -fpeel-loops -march=pentium4 -ffast-math > > -D__NO_MATH_INLINES (we still need explicit -fpeel-loops for > > unrolling for (i=0;i<3;++i) a[i]=0;), I need 0m17s for -n 10 with > > leafification turned on, with it turned off, runtime increases > > to 0m31s with --param inline-unit-growth=175. > > I compiled with -O3, would be possible for you to measure how much > speedup you get on mainline with -O3 and -O3+lefify? That would > probably allow me relate those numbers somehow. 0m23s for -O3+leafify, 1m54s for -O3, 0m35s for -O3 --param inline-unit-growth=150. Richard. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704
[Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2004-12-06 13:18 --- Subject: Re: [4.0 Regression] Inlining limits cause 340% performance regression On 6 Dec 2004, hubicka at ucw dot cz wrote: > The cfg inliner per se is not too interesting. What matters here is the > code size esitmation and profitability estimation. I am playing with > this now and trying to get profile based inlining working. Yes, I guess the cfg inliner and some early dead code removal passes should improve code size metrics for stuff like template struct Foo { enum { val = X::val }; void foo() { if (val) ... else ... } }; with val being const. > For -n10 and tramp3d.cc I need 2m14s on mainline, 1m31s on the current > tree-profiling. With my new implementation I need 0m27s with profile > feedback and 2m53s without. I wonder what makes the new heuristics work > worse without profiling, but just increasing the inline-unit-growth very > slightly (to 155) I get 0m42s. This might be just little unstability in Note that inline-unit-growth is 50 by default, so 155 is not slightly increased. > the order of inlining decisions affecting this. I would be curious how > those results compare to leafify and whether the 0m27s is not caused by > missoptimization. You can check for misoptimization by looking at the final output. I.e. the rh,vx,vy and vz sums should be nearly zero, the T sum will increase with the number of iterations. With mainline, -O2 -fpeel-loops -march=pentium4 -ffast-math -D__NO_MATH_INLINES (we still need explicit -fpeel-loops for unrolling for (i=0;i<3;++i) a[i]=0;), I need 0m17s for -n 10 with leafification turned on, with it turned off, runtime increases to 0m31s with --param inline-unit-growth=175. > Unless I will observe it otherwise (on SPEC with intermodule), I will > apply my current patch and try to improve the profitability analysis > without profiling incrementally. Ideally we ought to build estimated > profile and use it, but that needs some work so for the moment I guess I > will try to experiment with making loop depth available to the cgraph > code. Yes, loops could be "auto-leafified", but it will be difficult to statically check if that is worthwhile. Richard. -- Richard Guenther WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704
[Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2004-12-06 12:33 --- Subject: Re: [4.0 Regression] Inlining limits cause 340% performance regression On 6 Dec 2004, pinskia at gcc dot gnu dot org wrote: > No reason to keep this one open, there is PR 17863 still. > Also note I heard from Honza that the tree > profiling branch with feedback can optimizate better than with your > leafy patch. I tried tree-profiling branch and profile-based inlining is actually worse than "normal" inlining with inline-unit-growth=150. Worse by a factor of four. So, no cigar yet. And btw. profile based inlining seems to be ignorant of inline-unit-growth (at least it doesnt improve for greater values). And generating the profile is _very_ slow (for the tramp3d testcase). Runtime increases about 100 fold - not very good for creating a meaningful profile. Richard. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704
[Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2004-12-06 09:53 --- Subject: Re: [4.0 Regression] Inlining limits cause 340% performance regression On 6 Dec 2004, pinskia at gcc dot gnu dot org wrote: > No reason to keep this one open, there is PR 17863 still. Also note I heard > from Honza that the tree > profiling branch with feedback can optimizate better than with your leafy > patch. Wow, that would be cool. Does the tree-profiling branch contain the cfg inliner? I'll try it asap. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704
[Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2004-11-29 12:10 --- Documentation patches for 3.4 and mainline are here: http://gcc.gnu.org/ml/gcc-patches/2004-11/msg02457.html http://gcc.gnu.org/ml/gcc-patches/2004-11/msg02551.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704
[Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2004-11-29 11:04 --- Looking at the 3.4 branch the defaults for the relevant inlining parameters are the same. So the difference in performance has to be accounted to different tree-node counting (or to differences in the accounting during inlining). As we throttle inlining params if -Os is specified in opts.c: if (optimize_size) { /* Inlining of very small functions usually reduces total size. */ set_param_value ("max-inline-insns-single", 5); set_param_value ("max-inline-insns-auto", 5); flag_inline_functions = 1; may I suggest to throttle inline-unit-growth there, too (though it shouldn't have an effect with so small max-inline-insns-single). And then provide the documented limit (150) for inline-unit-growth? One may even argue that limiting overall unit growth is not important, as it is already limited by max-inline-insns-* and large-function-*. Also both inline-unit-growth and large-function-growth cause inlining to stop at the threshold leaving one with an unbalanced inlining decision. Why were these (growth) limits invented? Were there some particular testcases that broke down otherwise? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704
[Bug tree-optimization/18704] New: Inlining limits cause 340% performance regression
Compared to 3.4, the default inlining limits in 4.0 cause a 340% performance regression on the tramp3d-v3.cpp testcase here: http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/tramp3d-v3.cpp.gz The regression can be attributed to the inlining limits, as patching both compilers with the leafify patch results in same performance. Compilation options used are -Dleafify=fooblah -O2 -fpeel-loops -ffast-math -march=pentium4 -mfpmath=sse -fno-exceptions. Binary size is "improved" by about 9% with the current defaults. Using --param max-inline-insns-single=1000 worsens the situation to a Playing with the inlining params gives max-inline-insns-single large-function-growth inline-unit-growth regression 340% 1000 375% 500348% 200 -36% (1% size regression) 175 -35% (4% size improvement) 165 -12% 150 -12% (!?) 100 232% So I guess, limiting overall unit growth is bad - can we disable limiting at -Os, or provide a higher default value? The "correct" value will be different depending on the application. Also, the documented default value for inline-unit-growth is not what it actually seems to be (it is 50 reading params.def, large-function-growth is also not correctly documented). If we make the documented values the default, we get a 68% compile time and a 3.7% code size regression for a 71% performance improvement (this was including "correcting" the large-function-growth limit, which seems to hurt rather than help). -- Summary: Inlining limits cause 340% performance regression Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rguenth at tat dot physik dot uni-tuebingen dot de CC: gcc-bugs at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704
[Bug c++/18296] Misleading diagnostic for recursive template instantiation
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2004-11-04 14:30 --- Subject: Re: Misleading diagnostic for recursive template instantiation On 4 Nov 2004, pinskia at gcc dot gnu dot org wrote: > Confirmed, I think PR 15538 would fix the problem because the class is an incomplete > type at this > point. Yes, maybe - though icpc (7.1 and 8.0) in this case isn't helpful, too: tests> icpc -c notype.cpp notype.cpp(29): error: class "ComponentView" has no member "Type_t" typename ComponentView::Type_t ^ detected during: instantiation of class "Array [with Dim=1, T=double, EngineTag=Brick]" at line 19 instantiation of class "ComponentView> [with Dim=1, T=double, EngineTag=Brick]" at line 36 compilation aborted for notype.cpp (code 2) suspiciously similar to gcc. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18296
[Bug c++/18296] New: Misleading diagnostic for recursive template instantiation
template struct CompFwd; struct Brick; template struct Engine; template class Array; template struct ComponentView; template struct ComponentView > { typedef Array Subject_t; typedef typename Subject_t::Engine_t Engine_t; typedef Array > Type_t; }; template struct Array { typedef Engine Engine_t; typedef Array This_t; typename ComponentView::Type_t comp(int i1) const; }; typedef Array<1, double, Brick> Array_t; typedef ComponentView::Type_t CView_t; causes g++ to emit: tests> g++-3.4 -c notype.cpp notype.cpp: In instantiation of `Array<1, double, Brick>': notype.cpp:19: instantiated from `ComponentView' notype.cpp:36: instantiated from here notype.cpp:30: error: no type named `Type_t' in `struct ComponentView' which could be improved to mention the missing of the type is caused by aborted recursive instantiation of struct ComponentView. At the moment the diagnostic is at least misleading, as there is a Type_t in struct ComponentView. -- Summary: Misleading diagnostic for recursive template instantiation Product: gcc Version: 3.4.3 Status: UNCONFIRMED Severity: enhancement Priority: P2 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rguenth at tat dot physik dot uni-tuebingen dot de CC: gcc-bugs at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18296
[Bug tree-optimization/13776] [4.0 Regression] [tree-ssa] Many C++ compile-time regression in 4.0-tree-ssa 040120
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2004-10-25 13:02 --- Subject: Re: [4.0 Regression] [tree-ssa] Many C++ compile-time regression in 4.0-tree-ssa 040120 And http://gcc.gnu.org/ml/gcc/2004-10/msg00955.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13776
[Bug c/18042] [4.0 regression] does not handle struct initializer
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2004-10-17 21:51 --- Created an attachment (id=7369) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=7369&action=view) testcase -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18042
[Bug c/18042] New: [4.0 regression] does not handle struct initializer
The testcase is rejected with > gcc -c const.c const.c:25: error: initializer element is not constant the testcase is fine with any previous version of gcc. This is mainline from 20041017, a version from about two month ago was ok. -- Summary: [4.0 regression] does not handle struct initializer Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rguenth at tat dot physik dot uni-tuebingen dot de CC: gcc-bugs at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18042
[Bug c++/10479] alignof and sizeof (and other expressions) in attributes does not compile inside template classes
--- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2004-10-16 15:42 --- Subject: Re: alignof and sizeof (and other expressions) in attributes does not compile inside template classes giovannibajo at libero dot it wrote: > --- Additional Comments From giovannibajo at libero dot it 2004-10-16 11:06 > --- > Fixed in GCC 4.0.0. Thanks for your report! Can this be trivially backported to 3.4? That would be cool. Thanks, Richard. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10479