[Bug tree-optimization/43854] names for compiler generated temporaries are too long
--- Comment #2 from dann at godzilla dot ics dot uci dot edu 2010-04-22 19:54 --- (In reply to comment #1) > >Also "pretmp" "prehitmp" and "ivtmp" prefixes are too long, > > They might be too long but they are useful long without looking too much into > the code to figure out what kind of temp they are. We could just use D.XYZ > instead without a "long name". Or using P. (or PR.), I. (or IV.) > Really debug dumps should be used to debug the > compiler which means having nice names sometimes makes it easier to debug. As shown above, the names can be both shorter and nice, it's possible to have both. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43854
[Bug middle-end/43855] New: assembly labels are too long
Assembly labels are generated like thisL .LDECIMAL_NUMBER If instead of DECIMAL_NUMBER the hex version of the same number (or even better base 32 or base 64) the total assembly size would be reduced. For combine.s the file size difference for using hex is %1.5 for -O2 -S (if I remember well) Here's an emacs function that will estimate the size. Evaluate the function, open the .s file and do M-x my-estimate, it will show the size savings estimate. (defun my-estimate () (interactive) (let ((crt-size (point-max))) (goto-char (point-min)) (while (re-search-forward "\\([.]L[A-Z]*\\)\\([0-9]+\\)" nil t) (replace-match (format "%s%x" (match-string 1) (string-to-number (match-string 2))) nil nil)) (message "Size %% change = %f" (/ (* 100.0 (- (point-max) crt-size)) (point-max) This is a rather simple minded estimate, but it shouldn't bet too far. Things like .LBB and .LBE need to be considered carefully. This should help speed up the assembler a bit... -- Summary: assembly labels are too long Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu GCC host triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43855
[Bug tree-optimization/43854] New: names for compiler generated temporaries are too long
Looking at tree dumps, most variables used are compiler generated temporaries and they have names like pretmp.DECIMAL_NUMBER If instead of DECIMAL_NUMBER the same number bug in hex was used, this would reduce the memory used for those temporary names. This simple patch (that does not take care of all the temporaries, only a subset): Index: defaults.h === --- defaults.h (revision 158360) +++ defaults.h (working copy) @@ -46,12 +46,12 @@ #ifndef ASM_PN_FORMAT # ifndef NO_DOT_IN_LABEL -# define ASM_PN_FORMAT "%s.%lu" +# define ASM_PN_FORMAT "%s.%lx" # else # ifndef NO_DOLLAR_IN_LABEL -# define ASM_PN_FORMAT "%s$%lu" +# define ASM_PN_FORMAT "%s$%lx" # else -# define ASM_PN_FORMAT "__%s_%lu" +# define ASM_PN_FORMAT "__%s_%lx" # endif # endif #endif /* ! ASM_PN_FORMAT */ has this effect on the string pool (for an average size C file (dispnew.c from emacs): Before: avg. entry 17.04 bytes (+/- 8.46) after: avg. entry 16.99 bytes (+/- 8.50) so it's something given how small the change was. The difference would be even bigger if instead of base 32 or base 64 were used instead of hex, but that's a larger change... Also "pretmp" "prehitmp" and "ivtmp" prefixes are too long, they could be one or two letters... -- Summary: names for compiler generated temporaries are too long Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu GCC host triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43854
[Bug tree-optimization/2462] "restrict" implementation bug
--- Comment #8 from dann at godzilla dot ics dot uci dot edu 2009-06-25 15:31 --- (In reply to comment #7) > With the new restrict implementation baz() works and all the rest would work > as well if the calls to link_error () would not cause the malloced memory > to be clobbered. The artifact here is that malloced memory is considered > global (we are not allowed to remove stores to it). The intention for link_error was to just make it easier to write a test, not to prohibit optimization. Please feel free to adjust the code accordingly. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=2462
[Bug tree-optimization/39068] signed short plus and signed char plus not vectorized
--- Comment #3 from dann at godzilla dot ics dot uci dot edu 2009-02-02 16:42 --- (In reply to comment #2) > (reminds me of a couple missed-optimization PRs where vectorization is also > failing due to casts - PR31873 , PR26128 - don't know if this is related) Are the casts actually needed in this case? It seems the get introduced very early on, the .original dump already has: a[i] = (short int) ((short unsigned int) b[i] + (short unsigned int) c[i]); -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39068
[Bug tree-optimization/39075] alignment for "unsigned short a[10000]" vs "extern unsigned short a[10000]"
--- Comment #1 from dann at godzilla dot ics dot uci dot edu 2009-02-02 14:50 --- This code: unsigned short a[1]; void test() { int i; for (i = 0; i < 1; ++i) a[i] = 5; } will be vectorized with -O3 -march=core2 to this: .L2: movdqa %xmm0, a(%eax) addl$16, %eax cmpl$2, %eax jne .L2 but this one: extern unsigned short a[1]; void test() { int i; for (i = 0; i < 1; ++i) a[i] = 5; } will get a lot of extra code before the loop because the vectorizer thinks it needs to do peeling for alignment: test.c:7: note: Alignment of access forced using peeling. Intel's compiler does not generate the extra peeling code. -- dann at godzilla dot ics dot uci dot edu changed: What|Removed |Added Summary|alignment for "unsigned |alignment for "unsigned |short a[1 |short a[1]" vs "extern ||unsigned short a[1]" http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39075
[Bug tree-optimization/39075] New: alignment for "unsigned short a[10000
-- Summary: alignment for "unsigned short a[1 Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39075
[Bug tree-optimization/39068] New: signed short plus and signed char plus not vectorized
gcc -march=core2 -O3 -ftree-vectorizer-verbose=6 for this code: #define SIZE 1 signed short a[SIZE]; signed short b[SIZE]; signed short c[SIZE]; void add() { int i; for (i = 0; i < SIZE; ++i) a[i] = b[i] + c[i]; } cannot vectorize the loop: add_sshort.c:9: note: vect_model_load_cost: aligned. add_sshort.c:9: note: vect_model_load_cost: inside_cost = 1, outside_cost = 0 . add_sshort.c:9: note: not vectorized: relevant stmt not supported: D.1580_6 = (short unsigned int) D.1579_5 add_sshort.c:7: note: vectorized 0 loops in function. The same happens if the type for a,b and c is "signed char". But if the type is "unsigned short" or "unsigned char" the loop is vectorized. -- Summary: signed short plus and signed char plus not vectorized Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39068
[Bug tree-optimization/39069] New: signed short plus and signed char plus not vectorized
gcc -march=core2 -O3 -ftree-vectorizer-verbose=6 for this code: #define SIZE 1 signed short a[SIZE]; signed short b[SIZE]; signed short c[SIZE]; void add() { int i; for (i = 0; i < SIZE; ++i) a[i] = b[i] + c[i]; } cannot vectorize the loop: add_sshort.c:9: note: vect_model_load_cost: aligned. add_sshort.c:9: note: vect_model_load_cost: inside_cost = 1, outside_cost = 0 . add_sshort.c:9: note: not vectorized: relevant stmt not supported: D.1580_6 = (short unsigned int) D.1579_5 add_sshort.c:7: note: vectorized 0 loops in function. The same happens if the type for a,b and c is "signed char". But if the type is "unsigned short" or "unsigned char" the loop is vectorized. -- Summary: signed short plus and signed char plus not vectorized Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39069
[Bug tree-optimization/15484] [tree-ssa] bool and short function arguments promoted to int
--- Comment #6 from dann at godzilla dot ics dot uci dot edu 2008-11-20 23:27 --- Still happens in 4.4. -- dann at godzilla dot ics dot uci dot edu changed: What|Removed |Added Keywords||memory-hog http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15484
[Bug tree-optimization/27810] inefficient gimplification of function calls
--- Comment #4 from dann at godzilla dot ics dot uci dot edu 2008-11-20 18:43 --- Still happens with 4.4.0: qqq (int a) { int result.0; int D.1236; int result; result.0 = bar (a); result = result.0; D.1236 = result; return D.1236; } -- dann at godzilla dot ics dot uci dot edu changed: What|Removed |Added Version|unknown |4.4.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27810
[Bug middle-end/38204] New: PRE for post dominating expressions
For this function: int test (int a, int b, int c, int g) { int d, e; if (a) d = b * c; else d = b - c; e = b * c + g; return d + e; } the multiply expression is moved to both branches of the "if", it would be better to move it before the "if". Intel's compiler does that. -- Summary: PRE for post dominating expressions Product: gcc Version: 4.3.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu GCC host triplet: i386-pc-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38204
[Bug c++/13146] inheritance for nonoverlapping_component_refs_p
--- Comment #8 from dann at godzilla dot ics dot uci dot edu 2008-03-15 00:28 --- (In reply to comment #7) > The testcase is fixed by the SCCVN alias-oracle patch. Are you sure? I still see the problem (.final_cleanup dump): void bar(first*, multi*) (s1, s3) { : s1->f1 = 0; s3->f3 = 0; s1->f1 = s1->f1 + 1; s3->f3 = s3->f3 + 1; s1->f1 = s1->f1 + 1; s3->f3 = s3->f3 + 1; if (s1->f1 != 2) goto ; else goto ; : link_error () [tail call]; : return; } void foo(first*, second*) (s1, s2) { : s1->f1 = 0; s2->f2 = 0; s1->f1 = s1->f1 + 1; s2->f2 = s2->f2 + 1; s1->f1 = s1->f1 + 1; s2->f2 = s2->f2 + 1; if (s1->f1 != 2) goto ; else goto ; : link_error () [tail call]; : return; } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13146
[Bug tree-optimization/27799] adding unused char field inhibits optimization
--- Comment #9 from dann at godzilla dot ics dot uci dot edu 2008-03-04 21:43 --- (In reply to comment #8) > Subject: Re: adding unused char field inhibits > optimization > > On Tue, 4 Mar 2008, dann at godzilla dot ics dot uci dot edu wrote: > > > --- Comment #7 from dann at godzilla dot ics dot uci dot edu > > 2008-03-04 21:32 --- > > (In reply to comment #6) > > > Actually RTL alias is just using the same routines. > > Might be, but the RTL level code that optimizes away the abort() in both > > testcases (if I remember well nonoverlapping_component_refs_p). > > I still have the abort () with -O2. Argghh, sorry, my bad: typo in the "grep abort file.s" command ... > > That is for this testcase, but what about the impact on .final_cleanup for > > something big like combine.c? > > No idea, but feel free to check. I don't have a recent build... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27799
[Bug tree-optimization/27799] adding unused char field inhibits optimization
--- Comment #7 from dann at godzilla dot ics dot uci dot edu 2008-03-04 21:32 --- (In reply to comment #6) > Actually RTL alias is just using the same routines. Might be, but the RTL level code that optimizes away the abort() in both testcases (if I remember well nonoverlapping_component_refs_p). > > # SMT.4_6 = VDEF > # SMT.5_7 = VDEF > x_1(D)->x = 0; > # SMT.5_8 = VDEF > y_2(D)->y = 1; > > vs. > > # SMT.18_5 = VDEF > x_1(D)->x = 0; > # SMT.19_7 = VDEF > y_2(D)->y = 1; That is for this testcase, but what about the impact on .final_cleanup for something big like combine.c? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27799
[Bug tree-optimization/27799] adding unused char field inhibits optimization
--- Comment #5 from dann at godzilla dot ics dot uci dot edu 2008-03-04 21:19 --- (In reply to comment #4) > http://gcc.gnu.org/ml/gcc-patches/2008-03/msg00243.html Thanks for working on this! Have you looked at the impact? Probably the generated code won't too different because the RTL alias analysis probably catches this. But it would be interesting to see what is the difference for the tree dumps before and after this patch. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27799
[Bug middle-end/31575] Extra push+pop generated on x86
--- Comment #2 from dann at godzilla dot ics dot uci dot edu 2007-04-14 21:03 --- (In reply to comment #1) > This looks completely a register allocator issue and I think 4.2.0 and before > were just getting lucky. Also note that the extra push+pop are NOT generated when using -march=i386, -march=athlon, or -march=core2. But they ARE generated when using -march=i486, -march=i686, -march=pentium4 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31575
[Bug c/31575] New: Extra push+pop generated on x86
For this code: struct data { unsigned data[4][1]; unsigned char valid[4]; unsigned char flags[4]; }; void def(struct data *info, int index, unsigned *v) { if (info->flags[index]) { info->valid[index] = 1; info->data[index][0] = v[0]; } } SVN HEAD generates an extra push+pop compared to 4.1.1 when compiling with -O2 -march=pentium4 4.1.1 generates this code (3.4.3 generates identical code): def: pushl %ebp movl%esp, %ebp movl8(%ebp), %ecx movl12(%ebp), %edx cmpb$0, 20(%edx,%ecx) je .L4 movb$1, 16(%edx,%ecx) movl16(%ebp), %eax movl(%eax), %eax movl%eax, (%ecx,%edx,4) .L4: popl%ebp ret SVN HEAD: def: pushl %ebp movl%esp, %ebp pushl %ebx <--- extra instruction movl8(%ebp), %ecx movl12(%ebp), %edx cmpb$0, 20(%ecx,%edx) je .L4 movb$1, 16(%ecx,%edx) movl16(%ebp), %ebx movl(%ebx), %eax movl%eax, (%ecx,%edx,4) .L4: popl%ebx <--- extra instruction popl%ebp ret This is a regression from (at least) 3.4.3 and 4.1.1 -- Summary: Extra push+pop generated on x86 Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31575
[Bug rtl-optimization/30643] New: CSE regression
CSE used to eliminate all the "if"s in the code below at least in gcc-3.x (and probably even earlier). Now in SVN HEAD it does not do it anymore. 4.1 still does it. struct s { int a; int b;}; void bar (struct s *ps, int *p, int *__restrict__ rp, int *__restrict__ rq) { ps->a = 0; ps->b = 1; if (ps->a != 0)abort (); p[0] = 0; p[1] = 1; if (p[0] != 0) abort (); rp[0] = 0; rq[0] = 1; if (rp[0] != 0) abort(); } -O2 assembly for SVN HEAD: bar: subl$12, %esp movl16(%esp), %eax movl20(%esp), %edx movl24(%esp), %ecx movl$0, (%eax) movl$1, 4(%eax) movl(%eax), %eax testl %eax, %eax jne .L20 movl$0, (%edx) movl(%edx), %eax movl$1, 4(%edx) testl %eax, %eax jne .L20 movl$0, (%ecx) movl(%ecx), %ecx movl28(%esp), %eax testl %ecx, %ecx movl$1, (%eax) jne .L20 addl$12, %esp ret .L20: callabort -O2 assembly for 4.1.1 bar: movl4(%esp), %eax movl8(%esp), %edx movl$0, (%eax) movl$1, 4(%eax) movl12(%esp), %eax movl$0, (%edx) movl$1, 4(%edx) movl$0, (%eax) movl16(%esp), %eax movl$1, (%eax) ret -- Summary: CSE regression Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30643
[Bug tree-optimization/27798] gimplifying "return CONSTANT" creates unneeded temporaties
--- Comment #5 from dann at godzilla dot ics dot uci dot edu 2007-01-28 22:04 --- (In reply to comment #2) > i.e. it misses to initialize the temporary with the result. Otherwise you > can play with variants of the following patch: Richard, have you tried to make this patch work? It seems that with all the work that goes into inlining now, this might help a bit by making some function bodies smaller and and allowing the inliner to better estimate the actual size... > > Index: gimplify.c > === > *** gimplify.c (revision 114599) > --- gimplify.c (working copy) > *** gimplify_return_expr (tree stmt, tree *p > *** ,1116 > --- ,1124 > if (!result_decl > || aggregate_value_p (result_decl, TREE_TYPE (current_function_decl))) > result = result_decl; > + else if (/*is_gimple_formal_tmp_reg (TREE_OPERAND (ret_expr, 1)) > +||*/ is_gimple_min_invariant (TREE_OPERAND (ret_expr, 1)) > + /*is_gimple_val (TREE_OPERAND (ret_expr, 1))*/) > + { > + TREE_OPERAND (stmt, 0) = TREE_OPERAND (ret_expr, 1); > + > + return GS_ALL_DONE; > + } > else if (gimplify_ctxp->return_temp) > result = gimplify_ctxp->return_temp; > else > -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27798
[Bug tree-optimization/30105] reassoc can sometimes get in the way of PRE
--- Comment #6 from dann at godzilla dot ics dot uci dot edu 2006-12-12 06:07 --- (In reply to comment #5) > (In reply to comment #1) > > Confirmed (but it's not PRE). > > > The second is smaller, and no more or less efficient since the addition is > calculated on both paths anyway. > > Both are valid results, and what RTL does with them is it's business. > > I don't believe you can claim they should generate identical assembly. > > The actual thing this testcase is trying to test, that load-PRE is performed, > has succeeded. > Thus i am closing this bug as WORKSFORME. > If you see something *actually wrong* with the result, rather than just > disassembly, please feel free to reopen. Here is a slightly modified example that shows that there's still a PRE opportunity void motion_test22(int * data, int i) { int j; if (data[1]) { data[data[2]] = 2; j = data[0] * data[3]; i = i * j; } data[4] = data[0] * data[3]; data[5] = i; } :; *((int *) ((unsigned int) *(data + 8B) * 4) + data) = 2; prephitmp.26 = *data; prephitmp.31 = *(data + 12B); i = prephitmp.26 * i * prephitmp.31; :; *(data + 16B) = prephitmp.31 * prephitmp.26; *(data + 20B) = i; return; There are 3 multiplications on the L0-L1 path. It should be possible to only have 2 multiplications on that path. -- dann at godzilla dot ics dot uci dot edu changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|WORKSFORME | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30105
[Bug tree-optimization/30104] missed code motion optimization (invariant control structures)
--- Comment #4 from dann at godzilla dot ics dot uci dot edu 2006-12-07 18:24 --- (In reply to comment #3) > unswitching would duplicate the whole loop here, so not exactly I think. But > if-conversion to > > j = COND_EXPR > > or > > j = 2 - (int)p; > > would make j loop invariant. if-conversion would solve this particular testcase, but the more general case of moving invariant control structures out of the loop is probably more interesting. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30104
[Bug tree-optimization/30105] New: missed PRE
The following 2 functions should be compiled to the same assembly. This is one of Briggs' compiler benchmarks. void motion_test2(int *data) { int j; int i = 1; if (data[1]) { data[data[2]] = 2; j = data[0] + data[3]; i = i + j; } data[4] = data[0] + data[3]; data[5] = i; } void motion_result2(int *data) { int j; int i = 1; if (data[1]) { data[data[2]] = 2; j = data[0] + data[3]; i = i + j; } else j = data[0] + data[3]; data[4] = j; data[5] = i; } -- Summary: missed PRE Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30105
[Bug tree-optimization/30104] New: missed code motion optimization (invariant control structures)
The following 2 functions should be compiled to the same assembly. This is one of Briggs' compiler benchmarks. void motion_test10(int *data) { int j; int p = data[1]; int i = data[0]; do { if (p) j = 1; else j = 2; i = i + j; data[data[2]] = 2; } while (i < data[3]); } void motion_result10(int *data) { int j; int p = data[1]; int i = data[0]; if (p) j = 1; else j = 2; do { i = i + j; data[data[2]] = 2; } while (i < data[3]); } -- Summary: missed code motion optimization (invariant control structures) Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30104
[Bug tree-optimization/30103] New: missed strength reduction optimization (test replacement)
The following 2 functions should be compiled to the same assembly. This is one of Briggs' compiler benchmarks. void strength_test10(int *data) { int stop = data[3]; int i = 0; do { data[data[2]] = 21 * i; i = i + 1; } while (i < stop); } void strength_result10(int *data) { int stop = data[3] * 21; int i = 0; do { data[data[2]] = i; i = i + 21; } while (i < stop); } -- Summary: missed strength reduction optimization (test replacement) Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30103
[Bug tree-optimization/30102] New: missed strength reduction optimization (irreducible loops)
The following 2 functions should be compiled to the same assembly. This is one of Briggs' compiler benchmarks. void strength_test4(int *data) { int i; if (data[1]) { i = 2; goto here; } i = 0; do { i = i + 1; here: data[data[2]] = 2; } while (i * 21 < data[3]); } void strength_result4(int *data) { int i; if (data[1]) { i = 42; goto here; } i = 0; do { i = i + 21; here: data[data[2]] = 2; } while (i < data[3]); } -- Summary: missed strength reduction optimization (irreducible loops) Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30102
[Bug tree-optimization/30101] New: missed value numbering optimization (cprop+valnum)
The following 2 functions should be compiled to the same assembly. This is one of Briggs' compiler benchmarks. void vnum_test12(int *data) { int n; int stop = data[3]; int j = data[1]; int k = j; int i = 1; for (n=0; nhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=30101
[Bug tree-optimization/30100] New: missed value numbering optimization (conditional value numbers)
The following 2 functions should be compiled to the same assembly. This is one of Briggs' compiler benchmarks. void vnum_test11(int *data) { int n; int stop = data[3]; int j = data[1]; int k = j; int i = 1; for (n=0; nhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=30100
[Bug tree-optimization/30099] New: missed value numbering optimization (conditional-based assertions)
The following 2 functions should be compiled to the same assembly. This is one of Briggs' compiler benchmarks. void vnum_test10(int *data) { int i = data[0]; int m = i + 1; int j = data[1]; int n = j + 1; data[2] = m + n; if (i == j) data[3] = (m - n) * 21; } void vnum_result10(int *data) { int i = data[0]; int m = i + 1; int j = data[1]; int n = j + 1; data[2] = m + n; if (i == j) data[3] = 0; } -- Summary: missed value numbering optimization (conditional-based assertions) Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30099
[Bug tree-optimization/30098] New: missed value numbering optimization
The following 2 functions should be compiled to the same thing. This is a test from Briggs' compiler benchmarks. void vnum_test8(int *data) { int i; int stop = data[3]; int m = data[4]; int n = m; for (i=0; ihttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=30098
[Bug target/28946] [4.0/4.1/4.2 Regression] assembler shifts set the flag ZF, no need to re-test to zero
--- Comment #3 from dann at godzilla dot ics dot uci dot edu 2006-09-04 17:56 --- This specific case can probably be solved at the tree level by changing the test: (nb >> 5) != 0 to nb > 32 -- dann at godzilla dot ics dot uci dot edu changed: What|Removed |Added CC| |dann at godzilla dot ics dot | |uci dot edu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28946
[Bug tree-optimization/27810] inefficient gimplification of function calls
--- Comment #3 from dann at godzilla dot ics dot uci dot edu 2006-06-20 19:09 --- More data: for PR8361 the number of functions in the .gimple dump is 5045, the number of functions in the cleanup_cfg dump is 1341. The majority of the functions that are eliminated are small functions, for those the extra overhead due to inefficiencies in gimplification is significant. Maybe the people interested in compilation speed at -O0 (especially for C++) want to take a look at this and the related PRs... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27810
[Bug tree-optimization/27798] gimplifying "return CONSTANT" creates unneeded temporaties
--- Comment #3 from dann at godzilla dot ics dot uci dot edu 2006-06-13 14:42 --- One of the issues with this PR and also 27800, 27809 and 27810 is that this extra work/memory allocation done for a number of functions that are never used: like all the inline functions present in the glibc headers. These functions are thrown out after gimplification... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27798
[Bug tree-optimization/27809] inefficient gimplification of globals
--- Comment #3 from dann at godzilla dot ics dot uci dot edu 2006-06-13 14:22 --- (In reply to comment #2) > (In reply to comment #1) > > Hmm, it should have produced G.3, G.n, at least I would have thought. > > > > No, we intentionally use the same variable for the lexically identical > expressions, see internal_get_tmp_var/lookup_tmp_var. Original intention was > to make PRE and other redundancy elimination optimization passes more > efficient > (this was essential especially for the old SSAPRE pass that used lexical > equality of expressions to check for redundancies). These reasons are no > longer relevant, but keeping the code saves a significant amount of memory and > compile time (I tried removing the code a few months ago, but since it slows > down compilation by some 1-2%, I never bothered with posting the patch). Using the same variable is surely good, wouldn't it be even better to not create the redundant "G.2 = G" assignments in this PR? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27809
[Bug middle-end/27896] inefficient lowering for return
--- Comment #2 from dann at godzilla dot ics dot uci dot edu 2006-06-13 14:18 --- Add Diego to the CC list as per his request. -- dann at godzilla dot ics dot uci dot edu changed: What|Removed |Added CC||dnovillo at redhat dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27896
[Bug tree-optimization/27896] New: inefficient lowering for return
The .lower dump for this code: int foo (void) { return 1;} looks like: foo () { goto ; :; return 1; "goto" to the next line is useless, this just increases the memory usage and it needs extra work to be eliminated in a subsequent pass... -- Summary: inefficient lowering for return Product: gcc Version: 4.0.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27896
[Bug tree-optimization/27810] inefficient gimplification of function calls
--- Comment #2 from dann at godzilla dot ics dot uci dot edu 2006-05-31 21:47 --- My guesstimate is that for combine.i about 5-8% of the total number of expressions in the gimple dump are due to the gimplification inefficiencies shown in PRs 27798 27800 27809 27810, so these issues might have a compilation time impact if fixed... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27810
[Bug tree-optimization/27810] New: inefficient gimplification of function calls
int qqq (int a) { int result; result = bar (a); return result;} is gimplified to: qqq (a) { int D.2147; int D.2148; int result; D.2147 = bar (a); result = D.2147; D.2148 = result; return D.2148; } The D.2147 variable is redundant, the result of "bar" can be directly assigned to "result". Doing this just increases the memory footprint... (PR27800 is about the fact that D.2148 is also redundant) -- Summary: inefficient gimplification of function calls Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27810
[Bug tree-optimization/27809] New: inefficient gimplification of globals
This code: int G; int lll (int a) { bar (G, G, G, G); } is gimplified like this: lll (a) { int G.2; G.2 = G; G.2 = G; G.2 = G; G.2 = G; bar (G.2, G.2, G.2, G.2); } Creating that many identical expressions is wastefull... -- Summary: inefficient gimplification of globals Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27809
[Bug tree-optimization/27800] extra temprorary created when gimplifying return
--- Comment #1 from dann at godzilla dot ics dot uci dot edu 2006-05-29 20:51 --- An even simpler example which occurs quite frequently in programs: int jjj (int a){ return bar (a); } jjj (a) { int D.1891; int D.1892; D.1892 = bar (a); D.1891 = D.1892; return D.1891; } -- dann at godzilla dot ics dot uci dot edu changed: What|Removed |Added Summary|extra temprorary created|extra temprorary created |when gimplifying return |when gimplifying return http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27800
[Bug tree-optimization/27800] New: extra temprorary created when gimplifying return
One would think that the temporaries created when gimplifying the following 2 functions would be the same: void hhh (int a, int b, int c){ bar (a?b:c); } int iii (int a, int b, int c){ return (a?b:c); } But they are not: hhh (a, b, c) { int iftmp.0; if (a != 0) { iftmp.0 = b; } else { iftmp.0 = c; } bar (iftmp.0); } This one is fine. But this one: iii (a, b, c) { int D.2128; int iftmp.1; if (a != 0) { iftmp.1 = b; } else { iftmp.1 = c; } D.2128 = iftmp.1; return D.2128; } creates an extra temporary for the return expression. It would be more memory efficient if it would just use "iftmp.1" as the first function does. -- Summary: extra temprorary created when gimplifying return Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27800
[Bug tree-optimization/27799] New: adding unused char field inhibits optimization
For this code: struct X {double m; int x;}; struct Y {int y; short d;}; struct YY {int y; short d; char c;}; int foo(struct X *x, struct Y *y) { x->x = 0; y->y = 1; if (x->x != 0) abort (); } int foo_no(struct X *x, struct YY *y) { x->x = 0; y->y = 1; if (x->x != 0) abort (); } the "if" does not get optimized away (by the dom1 pass) for the "foo_no" function, but it is optimized for "foo" The only difference between the 2 functions is that foo_no takes as a parameter a pointer to a struct that has a "char" field that is not accessed in this function. It would be nice if both functions were optimized in the same way. -- Summary: adding unused char field inhibits optimization Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27799
[Bug tree-optimization/27798] New: gimplifying "return CONSTANT" creates unneeded temporaties
int zero { return 0; } is gimplified to: zero () { int D.2115; D.2115 = 0; return D.2115; } The D.2115 temporary is not needed, the return value is constant, it is of the same type as the function return type, and return CONSTANT is valid gimple. Not creating the temporary should save some memory and processing time later. -- Summary: gimplifying "return CONSTANT" creates unneeded temporaties Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27798
[Bug tree-optimization/27441] New: VAR - 1 not identified as the same as VAR + -1
It seems that neither FRE nor PRE can determine that stride.115 - 1 is the same as stride.115 + -1 in the example below (taken from the comm3 function in mgrid from SPEC2000). (Or am I missing something?) : stride.115 = *n; stride.117 = stride.115 * stride.115; offset.118 = ~stride.115 - stride.117; D.1969 = stride.115 - 1; if (D.1969 > 1) goto ; else goto ; :; pretmp.221 = stride.115 + -1; pretmp.228 = offset.118 + pretmp.221; pretmp.236 = offset.118 + stride.115; i3 = 2; -- Summary: VAR - 1 not identified as the same as VAR + -1 Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27441
[Bug target/27440] [4.0/4.1/4.2 regression] code quality regression due to ivopts
--- Comment #2 from dann at godzilla dot ics dot uci dot edu 2006-05-04 23:09 --- (In reply to comment #1) > IV-OPTs just gets info from the target. Now if the target says weird > addressing mode is the same as cheap ones, what do you think will happen? Does IV-OPTs also take into consideration the cost of having 2 IVs instead of 1? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27440
[Bug tree-optimization/27440] New: [4.0/4.1/4.2 regression] code quality regression due to ivopts
Compiling this code with 3.4.6 void fill2 (unsigned int *arr, unsigned int val, unsigned int start, unsigned int limit) { unsigned int i; for (i = start; i < start + limit; i++) arr[i] = val; } generates: .L10: movl%ecx, (%ebx,%eax,4) incl%eax .L8: cmpl%eax, %edx ja .L10 4.0/4.1/4.2 -O2 generate: .L4: incl%edx movl%esi, (%eax) addl$4, %eax cmpl%ecx, %edx jne .L4 which is both slower and bigger. using -O2 -fno-ivopts the result is much better: .L4: movl%ecx, (%ebx,%eax,4) incl%eax cmpl%edx, %eax jb .L4 The difference in the .final_cleanup dump with and without ivopts is obvious: With ivopts: void * ivtmp.29; unsigned int ivtmp.26; unsigned int D.1290; : D.1290 = start + limit; if (start < D.1290) goto ; else goto ; :; ivtmp.29 = arr + (unsigned int *) (start * 4); ivtmp.26 = 0; :; MEM[base: (unsigned int *) ivtmp.29] = val; ivtmp.26 = ivtmp.26 + 1; ivtmp.29 = ivtmp.29 + 4B; if (ivtmp.26 != D.1290 - start) goto ; else goto ; :; return; Without ivopts: unsigned int i; unsigned int D.1290; : D.1290 = start + limit; if (start < D.1290) goto ; else goto ; :; i = start; :; *((unsigned int *) (i * 4) + arr) = val; i = i + 1; if (i < D.1290) goto ; else goto ; :; return; The "void * ivtmp.29" is created by the ivopts pass. Why is it a void* when it is known to be assigned to a unsigned int* ? Note that loops like the one in this example are quite common. For example in the assembly for PR8361 there are about 37 "fill" functions with very similar code (they are intantiations of 2 different templates, but still...) -- Summary: [4.0/4.1/4.2 regression] code quality regression due to ivopts Product: gcc Version: 4.0.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27440
[Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code
--- Comment #8 from dann at godzilla dot ics dot uci dot edu 2006-05-03 21:53 --- WRT this code generated by tree-ch: D.1305_41 = Int_Loc_3 + 1; if (Int_Loc_3 <= D.1305_41) goto ; else goto ; AFAICT there's exactly one value for which the comparison can be false, IMO it would be better to test directly that value instead of generating a new SSA name and another expression. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944
[Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code
--- Comment #5 from dann at godzilla dot ics dot uci dot edu 2006-05-03 18:54 --- IMO Comment #4 does not look close enough at what is actually happening. IMO tree-ch is the root cause here. The code looks like this before .ch Before .ch goto (); :; D.1301_54 = Int_Loc.0_4 * 200; D.1302_55 = (int[50] *) D.1301_54; D.1303_56 = Arr_2_Par_Ref_30 + D.1302_55; (*D.1303_56)[Int_Index_1] = Int_Loc_3; Int_Index_58 = Int_Index_1 + 1; # Int_Index_1 = PHI ; :; D.1305_26 = Int_Loc_3 + 1; if (Int_Index_1 <= D.1305_26) goto ; else goto ; :; after .ch it looks like this: D.1305_41 = Int_Loc_3 + 1; if (Int_Loc_3 <= D.1305_41) goto ; else goto ; <-- this just complicates the CFG. Look below to see what are the effects of doing this in later passes. Plus just look at the comparison ... # Int_Index_37 = PHI ; :; D.1301_54 = Int_Loc.0_4 * 200; D.1302_55 = (int[50] *) D.1301_54; D.1303_56 = Arr_2_Par_Ref_30 + D.1302_55; (*D.1303_56)[Int_Index_37] = Int_Loc_3; Int_Index_58 = Int_Index_37 + 1; D.1305_26 = Int_Loc_3 + 1; if (D.1305_26 >= Int_Index_58) goto ; else goto ; :; Given the above CFG, critical edge splitting transforms this into: D.1305_41 = Int_Loc_3 + 1; if (Int_Loc_3 <= D.1305_41) goto ; else goto ; :; goto (); :; # Int_Index_37 = PHI ; :; D.1301_54 = Int_Loc.0_4 * 200; D.1302_55 = (int[50] *) D.1301_54; D.1303_56 = Arr_2_Par_Ref_30 + D.1302_55; (*D.1303_56)[Int_Index_37] = Int_Loc_3; Int_Index_58 = Int_Index_37 + 1; if (D.1305_41 >= Int_Index_58) goto ; else goto ; :; goto (); :; :; Given the above CFG PRE will dutifully fill with code a lot of the empty basic blocks: after pre D.1305_41 = Int_Loc_3 + 1; if (Int_Loc_3 <= D.1305_41) goto ; else goto ; :; pretmp.34_45 = Int_Loc.0_4 * 200; pretmp.36_57 = (int[50] *) pretmp.34_45; pretmp.38_25 = Arr_2_Par_Ref_30 + pretmp.36_57; goto (); :; pretmp.30_26 = Int_Loc.0_4 * 200; pretmp.31_19 = (int[50] *) pretmp.30_26; pretmp.32_1 = pretmp.31_19 + Arr_2_Par_Ref_30; # Int_Index_37 = PHI ; :; D.1301_54 = pretmp.30_26; D.1302_55 = pretmp.31_19; D.1303_56 = pretmp.32_1; (*D.1303_56)[Int_Index_37] = Int_Loc_3; Int_Index_58 = Int_Index_37 + 1; if (D.1305_41 >= Int_Index_58) goto ; else goto ; :; goto (); :; # prephitmp.39_23 = PHI ; # prephitmp.37_53 = PHI ; # prephitmp.35_49 = PHI ; :; Now when using -fno-tree-ch before critical edge splitting the code looks like this: goto (); :; D.1301_54 = Int_Loc.0_4 * 200; D.1302_55 = (int[50] *) D.1301_54; D.1303_56 = Arr_2_Par_Ref_30 + D.1302_55; (*D.1303_56)[Int_Index_1] = Int_Loc_3; Int_Index_58 = Int_Index_1 + 1; # Int_Index_1 = PHI ; :; D.1305_26 = Int_Loc_3 + 1; if (Int_Index_1 <= D.1305_26) goto ; else goto ; :; after crited it looks like this: (i.e. no change) goto (); :; D.1301_54 = Int_Loc.0_4 * 200; D.1302_55 = (int[50] *) D.1301_54; D.1303_56 = Arr_2_Par_Ref_30 + D.1302_55; (*D.1303_56)[Int_Index_1] = Int_Loc_3; Int_Index_58 = Int_Index_1 + 1; # Int_Index_1 = PHI ; :; D.1305_26 = Int_Loc_3 + 1; if (Int_Index_1 <= D.1305_26) goto ; else goto ; :; and after PRE goto (); :; D.1301_54 = pretmp.31_49; D.1302_55 = pretmp.32_45; D.1303_56 = pretmp.33_41; (*D.1303_56)[Int_Index_1] = Int_Loc_3; Int_Index_58 = Int_Index_1 + 1; # Int_Index_1 = PHI ; :; D.1305_26 = pretmp.30_19; if (Int_Index_1 <= D.1305_26) goto ; else goto ; :; -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944
[Bug tree-optimization/15911] VRP/DOM does not like TRUTH_AND_EXPR
--- Comment #28 from dann at godzilla dot ics dot uci dot edu 2006-04-30 19:25 --- Just a note, fixing the problem in this PR would fix the only remaining failure for cprop in Brigg's compiler benchmarks. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15911
[Bug tree-optimization/27365] New: add a way to mark that a path cannot be taken, something like __builtin_unreachable()
It would be nice to have some form of a builtin that shows that a portion of the code is not reachable, and it generates no code in the binary. gcc_unreachable() is used now in the gcc sources for this, but it will generate assembly code that calls abort(). Another way to accomplish the same thing could be with attributes Can attributes be used for function calls? I beleive right now they can't. If they could, then something like this could work: myfunc(foo,bar,baz) __attribute__((noreturn)); Some functions are known not to return only in certain situations, so they cannot be declared as being "noreturn". An example where this would be useful is the Fsignal function in emacs. -- Summary: add a way to mark that a path cannot be taken, something like __builtin_unreachable() Product: gcc Version: unknown Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27365
[Bug target/26949] New: [4.2 regression] worse code generated for -march=pentium4
Compiling the code in PR26944 with -O2 -march=pentium4 -fno-tree-ch generates this for the loop: .L3: movl%esi, -4(%eax) addl$1, %edx addl$4, %eax cmpl-16(%ebp), %edx <- note an extra memory access here jle .L3 compiling for -march=i686 (or even just adding -fomit-frame-pointer) generates: .L3: addl$1, %ecx movl%ebx, -4(%edx) addl$4, %edx cmpl%eax, %ecx < no memory access here jle .L3 The above problem does not happen with gcc-4.0.3 or 4.1.0 -- Summary: [4.2 regression] worse code generated for - march=pentium4 Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26949
[Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code
--- Comment #2 from dann at godzilla dot ics dot uci dot edu 2006-03-30 16:43 --- (In reply to comment #1) > Note that this may be also PRE confusing SCEV in presence of loop headers. Talking about PRE, here's a maybe interesting observation in the PRE dump: :; pretmp.30_53 = Int_Loc.0_4 * 200; pretmp.32_23 = (int[50] *) pretmp.30_53; pretmp.32_11 = pretmp.32_23 + Arr_2_Par_Ref_30; goto (); :; pretmp.27_59 = Int_Loc.0_4 * 200; pretmp.28_45 = (int[50] *) pretmp.27_59; pretmp.28_49 = Arr_2_Par_Ref_30 + pretmp.28_45; # Int_Index_37 = PHI ; :; D.1544_54 = pretmp.27_59; D.1545_55 = pretmp.28_45; D.1546_56 = pretmp.28_49; (*D.1546_56)[Int_Index_37] = Int_Loc_3; Int_Index_58 = Int_Index_37 + 1; if (D.1548_41 >= Int_Index_58) goto ; else goto ; :; goto (); :; # prephitmp.33_40 = PHI ; # prephitmp.33_18 = PHI ; # prephitmp.31_25 = PHI ; Compare pretmp.28_49 with pretmp.32_11, why are the arguments in a different order? Is there something unstable in the PRE algorithm? One has to wonder what are the tree-ch effects on more complex loops. It might be interesting test SPEC with and without tree-ch... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944
[Bug tree-optimization/26944] New: -ftree-ch generates worse code
The loop from the code below is compiled to this when using gcc-4.1 -O2 .L5: movl16(%ebp), %eax addl%ecx, %eax addl$1, %ecx movl%edx, 20(%ebx,%eax,4) leal(%edx,%ecx), %eax cmpl%edi, %eax jle .L5 but the code is much better when using gcc -fno-tree-ch -O2 .L3: addl$1, %ecx movl%ebx, -4(%edx) addl$4, %edx cmpl%eax, %ecx jle .L3 This is a regression as gcc-3.4.3 generates similar code. The code is from the Dhrystone as included in Unixbench. The regression is quite important as embedded processor people still use Dhrystone for benchmarking compiler/processor speed. Its strange that tree-ch messes up, the loop is about as simple as loops can get. typedef int One_Fifty; typedef int Arr_1_Dim [50]; typedef int Arr_2_Dim [50] [50]; extern int Int_Glob; void Proc_8 (Arr_1_Par_Ref, Arr_2_Par_Ref, Int_1_Par_Val, Int_2_Par_Val) Arr_1_Dim Arr_1_Par_Ref; Arr_2_Dim Arr_2_Par_Ref; int Int_1_Par_Val; int Int_2_Par_Val; { register One_Fifty Int_Index; register One_Fifty Int_Loc; Int_Loc = Int_1_Par_Val + 5; Arr_1_Par_Ref [Int_Loc] = Int_2_Par_Val; Arr_1_Par_Ref [Int_Loc+1] = Arr_1_Par_Ref [Int_Loc]; Arr_1_Par_Ref [Int_Loc+30] = Int_Loc; for (Int_Index = Int_Loc; Int_Index <= Int_Loc+1; ++Int_Index) Arr_2_Par_Ref [Int_Loc] [Int_Index] = Int_Loc; Arr_2_Par_Ref [Int_Loc] [Int_Loc-1] += 1; Arr_2_Par_Ref [Int_Loc+20] [Int_Loc] = Arr_1_Par_Ref [Int_Loc]; Int_Glob = 5; } Intel's compiler generates even tighter code: ..B1.7: # Preds ..B1.10 ..B1.7 movl %ebx, (%ecx,%edx,4) #20.5 addl $1, %edx #19.55 cmpl %eax, %edx#19.3 jle ..B1.7# Prob 80% #19.3 -- Summary: -ftree-ch generates worse code Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944
[Bug tree-optimization/26850] New: unused function not eliminated with -fwhole-program --combine
Compile these 2 files with gcc -O2 -fwhole-program --combine a.c b.c a.c: int main (void) { return 0;} b.c: static int tst1 (int x) {return x;} static int global_static; int global; int tst2 (int x, int y) {foo (tst1, x, y, &global_static, &global);} The generated assembly still contains the tst1 function. tst2, global and static_global have been eliminated. It seems that functions that have their address taken should be reconsidered for elimination after eliminating the functions (and variables too) that took their address. Note that in the above case compiling the files separately will generate less code as the whole b.o file will be eliminated by the linker... -- Summary: unused function not eliminated with -fwhole-program -- combine Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26850
[Bug middle-end/23488] [4.1/4.2 Regression] GCSE load PRE does not work with non sets
--- Comment #18 from dann at godzilla dot ics dot uci dot edu 2006-03-03 02:14 --- (In reply to comment #17) > (In reply to comment #5) > > It's strange that the load(*) does not get optimized, given that it's in the > > same BB as the store that precedes it... > > > >movl%eax, result.1282 > > (*)movlresult.1282, %eax > > This is because the copying of the trace is happening at the very end of the > optimization phase so it does not optimized at all. Right, the copying happens in .bbro (as shown in PR26537). gcc-4.0 did the same kind of copying in .bbro, but it did not generate the redundant mov. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23488
[Bug rtl-optimization/26537] New: Basic block reordering inserts redundant instruction
This code: extern char *nl_langinfo (int) __attribute__ ((__nothrow__)); char * xtermEnvEncoding(void) { static char *result; if (result == 0) result = nl_langinfo(50); return result; } gets compile by gcc-4.1.0 -march=i686 -mtune=i686 to: xtermEnvEncoding: [snip] .L6: movl$50, (%esp) callnl_langinfo movl%eax, result.1281 movlresult.1281, %eax < note this leave ret Note the redundant mov instruction. 4.0 does not generate that extra instruction. The extra instruction seems to be generated by the bbro pass. Here is the RTL dump for the .44.rnreg pass: nothing unusual (call_insn:HI 17 16 19 1 (set (reg:SI 0 ax) (call (mem:QI (symbol_ref:SI ("nl_langinfo") [flags 0x41] ) [0 S1 A8]) (const_int 4 [0x4]))) 531 {*call_value_0} (nil) (expr_list:REG_EH_REGION (const_int 0 [0x0]) (nil)) (nil)) (insn:HI 19 17 20 1 (set (mem/f/c/i:SI (symbol_ref:SI ("result.1281") [flags 0x2] ) [2 result+0 S4 A32]) (reg:SI 0 ax [orig:58 D.1283 ] [58])) 34 {*movsi_1} (insn_list:REG_DEP_TRUE 18 (nil )) (expr_list:REG_DEAD (reg:SI 0 ax [orig:58 D.1283 ] [58]) (nil))) but the next dump, .45.bbro shows that an extra move instruction has been inserted. (call_insn:HI 17 16 19 2 (set (reg:SI 0 ax) (call (mem:QI (symbol_ref:SI ("nl_langinfo") [flags 0x41] ) [0 S1 A8]) (const_int 4 [0x4]))) 531 {*call_value_0} (nil) (expr_list:REG_EH_REGION (const_int 0 [0x0]) (nil)) (nil)) (insn:HI 19 17 54 2 (set (mem/f/c/i:SI (symbol_ref:SI ("result.1281") [flags 0x2] ) [2 result+0 S4 A32]) (reg:SI 0 ax [orig:58 D.1283 ] [58])) 34 {*movsi_1} (insn_list:REG_DEP_TRUE 18 (nil )) (expr_list:REG_DEAD (reg:SI 0 ax [orig:58 D.1283 ] [58]) (nil))) (insn 54 19 55 2 (set (reg/f:SI 0 ax [orig:61 result ] [61]) (mem/f/c/i:SI (symbol_ref:SI ("result.1281") [flags 0x2] ) [2 result+0 S4 A32])) 34 {*movsi_1} (nil) (nil)) This problem is one of the causes for PR23153. -- Summary: Basic block reordering inserts redundant instruction Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26537
[Bug tree-optimization/26251] [4.2 Regression] code size increase with -Os
--- Comment #4 from dann at godzilla dot ics dot uci dot edu 2006-02-13 02:34 --- Here's another testcase of what seems to be the same problem. The 4.2 assembly contains 2 calls for TabSet, 4.0 only has 1. (both this and the first example are function from xterm in case anybody wonders) typedef unsigned Tabs [10]; void TabSet(Tabs tabs, int col); void TabReset(Tabs tabs) { int i; for (i = 0; i < 10; ++i) tabs[i] = 0; for (i = 0; i < ((1 << 5) * 10); i += 8) TabSet(tabs, i); } void TabSet(Tabs tabs, int col) { tabs[((col) >> 5)] |= (1 << ((col) & ((1 << 5)-1))); } 4.2 assembly: TabReset: pushl %ebp movl$2, %eax movl%esp, %ebp pushl %esi movl8(%ebp), %esi pushl %ebx movl$0, (%esi) .L4: movl$0, -4(%esi,%eax,4) incl%eax cmpl$11, %eax jne .L4 pushl $0 movl$8, %ebx pushl %esi callTabSet popl%ecx popl%eax .L6: pushl %ebx addl$8, %ebx pushl %esi callTabSet cmpl$320, %ebx popl%eax popl%edx jne .L6 leal-8(%ebp), %esp popl%ebx popl%esi popl%ebp ret -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26251
[Bug tree-optimization/26251] New: code size increase with -Os
Compiling the function below with -Os -march=i686 -mtune=pentiumpro generates bigger code for 4.2 than for 4.0. The reason seems to be that 4.2 peels off one loop iteration. typedef unsigned Tabs [10]; void TabZonk(Tabs tabs) { int i; for (i = 0; i < 10; ++i) tabs[i] = 0; } sdiff gcc-4.0.s gcc-4.2.s TabZonk: TabZonk: pushl %ebp pushl %ebp movl$1, %eax| movl$2, %eax movl%esp, %ebpmovl%esp, %ebp movl8(%ebp), %edx movl8(%ebp), %edx > movl$0, (%edx) > .p2align 4,,15 .L2: .L2: movl$0, -4(%edx,%eax,4) | xorl%ecx, %ecx > movl%ecx, -4(%edx,%eax,4) incl%eax incl%eax cmpl$11, %eax cmpl$11, %eax jne .L2 jne .L2 popl%ebp popl%ebp ret ret -- Summary: code size increase with -Os Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu GCC host triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26251
[Bug c/26249] New: cc1 --help segfaults
On an up to date Fedora Core 4 system, with the latest update from svn today cc1 --help segfaults. The configure conmmand line was: ../gcc/configure --enable-languages=c --disable-checking --disable-nls --enable-gather-detailed-mem-stats --prefix=${HOME}/build/gcc-HEAD A gdb session: Starting program: /home/dann/build/gcc-HEAD/libexec/gcc/i686-pc-linux-gnu/4.2.0/cc1 --help Reading symbols from shared object read from target memory...done. Loaded system supplied DSO at 0xa7e000 The following options are language-independent: Program received signal SIGSEGV, Segmentation fault. 0x082bee9d in print_filtered_help (flag=536870912) at ../../gcc/gcc/opts.c:1335 1335 memset (printed, 0, cl_options_count); (gdb) bt #0 0x082bee9d in print_filtered_help (flag=536870912) at ../../gcc/gcc/opts.c:1335 #1 0x082c0147 in decode_options (argc=2, argv=0xbfac53c4) at ../../gcc/gcc/opts.c:1284 #2 0x083175a9 in toplev_main (argc=2, argv=0xbfac53c4) at ../../gcc/gcc/toplev.c:1970 #3 0x0809e2bf in main (argc=134857264, argv=0x219) at ../../gcc/gcc/main.c:35 -- Summary: cc1 --help segfaults Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu GCC host triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26249
[Bug target/11877] gcc should use xor trick with -Os
--- Comment #9 from dann at godzilla dot ics dot uci dot edu 2006-01-05 20:22 --- (In reply to comment #7) > *** Bug 23338 has been marked as a duplicate of this bug. *** > Bug 23338 contained a patch that might fixed this issue. Here it is, so that it can be evaluated. *** i386.md 08 Aug 2005 16:38:37 -0700 1.652 --- i386.md 11 Aug 2005 11:27:11 -0700 *** *** 18874,18881 [(match_scratch:SI 1 "r") (set (match_operand:SI 0 "memory_operand" "") (const_int 0))] ! "! optimize_size !&& ! TARGET_USE_MOV0 && TARGET_SPLIT_LONG_MOVES && get_attr_length (insn) >= ix86_cost->large_insn && peep2_regno_dead_p (0, FLAGS_REG)" --- 18874,18880 [(match_scratch:SI 1 "r") (set (match_operand:SI 0 "memory_operand" "") (const_int 0))] ! "! TARGET_USE_MOV0 && TARGET_SPLIT_LONG_MOVES && get_attr_length (insn) >= ix86_cost->large_insn && peep2_regno_dead_p (0, FLAGS_REG)" -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11877
[Bug rtl-optimization/24810] [4.1/4.2 Regression] mov + mov + testl generated instead of testb
--- Comment #6 from dann at godzilla dot ics dot uci dot edu 2005-12-18 22:57 --- (In reply to comment #5) > Simplified testcase seems to work for me on 4.1 branch: > restore_fpu: > movl4(%esp), %edx > movlboot_cpu_data+12, %eax > testl $16777216, %eax 4.0 still does better, it uses a single "testb" instruction instead of 2 dependent movl + testb instructions. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810
[Bug rtl-optimization/25489] New: Suboptimal code generated for coparisons on Sparc
This code: typedef struct { int protected_mode; int x; } TScreen; extern void ClearRight (TScreen *screen, int n); extern void ClearLeft(TScreen * screen); extern void ClearLine(TScreen * screen); void do_erase_line(TScreen * screen, int param, int mode) { int saved_mode = screen->protected_mode; if (saved_mode == 1 && saved_mode != mode) screen->protected_mode = 0; switch (param) { case -1:/* DEFAULT */ case 0: ClearRight(screen, -1); break; case 1: ClearLeft(screen); break; case 2: ClearLine(screen); break; } screen->protected_mode = saved_mode; } is compiled to: (when using -O2 -mcpu=ultrasparc using gcc-4.0.2 and gcc-4.2) do_erase_line: save%sp, -112, %sp ld [%i0], %l0 xor %l0, 1, %g1 <- from here xor %l0, %i2, %i2 subcc %g0, %g1, %g0 subx%g0, -1, %g2 subcc %g0, %i2, %g0 addx%g0, 0, %g1 andcc %g2, %g1, %g0 <- to here bne,a,pt %icc, .LL2 st %g0, [%i0] .LL2: cmp %i1, 1 be,pn %icc, .LL6 nop [snip] The code generated for the "if" can be better implemented as (pseudoassembly): xor save_mode, 1, tmp1 xnor save_mode, mode, tmp2 orcc tmp1, tmp2 I don't know if this is a Sparc specific problem, or a general problem. -- Summary: Suboptimal code generated for coparisons on Sparc Product: gcc Version: 4.0.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu GCC target triplet: sparc-sun-solaris2.8 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25489
[Bug rtl-optimization/24810] [4.1 Regression] mov + mov + testl generated instead of testb
--- Comment #2 from dann at godzilla dot ics dot uci dot edu 2005-11-13 02:47 --- Simplified testcase: struct cpuinfo_x86 { unsigned char x86; unsigned char x86_vendor; unsigned char x86_model; unsigned char x86_mask; char wp_works_ok; char hlt_works_ok; char hard_math; char rfu; int cpuid_level; unsigned long x86_capability[7]; } __attribute__((__aligned__((1 << (7); struct task_struct; extern void foo (struct task_struct *tsk); extern void bar (struct task_struct *tsk); extern struct cpuinfo_x86 boot_cpu_data; static inline __attribute__((always_inline)) int constant_test_bit(int nr, const volatile unsigned long *addr) { return ((1UL << (nr & 31)) & (addr[nr >> 5])) != 0; } void restore_fpu(struct task_struct *tsk) { if (constant_test_bit(24, boot_cpu_data.x86_capability)) foo (tsk); else bar (tsk); } The generated code for this simplified tescase shows one additional issue: restore_fpu: movl%eax, %edx movlboot_cpu_data+12, %eax ; edx could be used here testl $16777216, %eax ; and here je .L2 movl%edx, %eax ; then all the mov %eax, %edx and mov %edx, %eax jmp foo ; instructions could be eliminated. .p2align 4,,7 .L2: movl%edx, %eax jmp bar -- dann at godzilla dot ics dot uci dot edu changed: What|Removed |Added Summary|mov + mov + testl generated |[4.1 Regression] mov + mov + |instead of testb|testl generated instead of ||testb http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810
[Bug rtl-optimization/24810] mov + mov + testl generated instead of testb
--- Comment #1 from dann at godzilla dot ics dot uci dot edu 2005-11-11 19:29 --- Created an attachment (id=10220) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=10220&action=view) Preprocessed code containing the functions that exhibit the problem -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810
[Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb
Compiling i387.c from the Linux kernel using: -nostdinc -isystem /usr/lib/gcc/i386-redhat-linux/4.0.1/include -D__KERNEL__ -Iinclude -Wall -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -ffreestanding -O2 -fomit-frame-pointer -g -save-temps -msoft-float -m32 -fno-builtin-sprintf -fno-builtin-log2 -fno-builtin-puts -mpreferred-stack-boundary=2 -fno-unit-at-a-time -march=i686 -mtune=pentium4 -mregparm=3 -Iinclude/asm-i386/mach-default -Wdeclaration-after-statement -Wno-pointer-sign -DKBUILD_BASENAME=i387 -DKBUILD_MODNAME=i387 -carch/i386/kernel/i387.c (these are the flags generated by rpmbuild on a Fedora Core 4 system) Using 4.0 the restore_fpu function looks like: restore_fpu: testb $1, boot_cpu_data+15 je .L23 [snip] Using 4.1 it looks like: restore_fpu: movl%eax, %edx movlboot_cpu_data+12, %eax testl $16777216, %eax je .L24 [snip] Similar code sequences appear in other functions in the same file: get_fpu_mxcsr, get_fpu_swd, get_fpu_cwd, set_fpregs. The size of these functions increases by 5 bytes (i.e.20%) It seems that some of these functions might be on some critical path in the kernel, so the size increase (and maybe speed penalty) could have an impact. For 4.0 the 00.expand dump looks like: (insn 9 7 10 1 (set (reg/f:SI 59) (const:SI (plus:SI (symbol_ref:SI ("boot_cpu_data") [flags 0x40] ) (const_int 12 [0xc] -1 (nil) (nil)) (insn 10 9 11 1 (set (reg:SI 60) (mem/s/j:SI (reg/f:SI 59) [0 boot_cpu_data.x86_capability+0 S4 A32])) -1 (nil) (nil)) (insn 11 10 12 1 (parallel [ (set (reg:SI 61) (and:SI (reg:SI 60) (const_int 16777216 [0x100]))) (clobber (reg:CC 17 flags)) ]) -1 (nil) (nil)) (insn 12 11 13 1 (set (reg:CCZ 17 flags) (compare:CCZ (reg:SI 61) (const_int 0 [0x0]))) -1 (nil) (nil)) for 4.1 is identical except for insn 10 which has mem/s/v/j:SI instead of mem/s/j:SI. The combine pass of 4.0 deletes insn 10, that does not happen for 4.1 For 4.1 the generated code does not change when using -Os or -march=pentium4 This is one of the causes for PR23153 -- Summary: mov + mov + testl generated instead of testb Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810
[Bug rtl-optimization/23523] peephole2 causes code size on i686
--- Comment #12 from dann at godzilla dot ics dot uci dot edu 2005-11-03 07:51 --- (In reply to comment #11) > (In reply to comment #10) > > I am not sure what kind of answer you expect here. > > Speed and code size are not disjoint. Think about I-cache and I-TLB misses. > But again who is using an pentiumpro machine any more. People who really care The code generated -march=pentium3 or -march=pentium-m generate the same code. If you want to close this bug please address the technical issues about peepholes in comment #8. -- dann at godzilla dot ics dot uci dot edu changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|INVALID | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23523
[Bug target/23524] [4.1 Regression]bigger version of mov + cmp produced
--- Comment #16 from dann at godzilla dot ics dot uci dot edu 2005-11-03 06:42 --- (In reply to comment #15) > (In reply to comment #11) > > And FWIW there is also a problem with this insn, the length is wrong: > > > > #(insn 11 46 47 0x2a955cc840 (set (reg:SI 0 eax [orig:61 x ] [61]) > > #(mem/f:SI (symbol_ref:SI ("x")) [5 x+0 S4 A32])) 44 {*movsi_1} > > (nil) > > #(expr_list:REG_EQUIV (mem/f:SI (symbol_ref:SI ("x")) [5 x+0 S4 A32]) > > #(nil))) > > A1 movlx, %eax # 11*movsi_1/1 [length = 6] > > FYI: This problem is addressed in patch at > http://gcc.gnu.org/ml/gcc-patches/2005-11/msg00116.html. Do you know if your patch also fixes this PR? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23524
[Bug rtl-optimization/23523] peephole2 causes code size on i686
--- Comment #10 from dann at godzilla dot ics dot uci dot edu 2005-11-03 02:34 --- (In reply to comment #9) > Have you tested the speed? As I said I really doubt it makes a real world > change in terms of speed. This is different from code size. I am not sure what kind of answer you expect here. Speed and code size are not disjoint. Think about I-cache and I-TLB misses. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23523
[Bug rtl-optimization/23523] peephole2 causes code size on i686
--- Comment #8 from dann at godzilla dot ics dot uci dot edu 2005-11-03 02:12 --- (In reply to comment #6) > The use of ax vs cx will not matter in the real world. This is from a real world program (xterm) and it seems to matter, when using eax the code is smaller. Are you sure that the fact that eax is not used does not cover some other problem? Are the free registers picked at random for peepholes? It might be the case that 4.0 was just using eax by chance, but that does not mean the PR should be dismissed as invalid without understanding the underlying problem. -- dann at godzilla dot ics dot uci dot edu changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|INVALID | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23523
[Bug rtl-optimization/23523] peephole2 causes code size on i686
--- Comment #5 from dann at godzilla dot ics dot uci dot edu 2005-11-03 01:27 --- (In reply to comment #4) > This is actually invalid as nothing happens for -Os case so what you are > seeing > is just an atrifact. Sorry but this explanation for marking the PR invalid does not make sense. The code in question is generated using -O2 not -Os. IMO the observation in comment #3 is important, and there should be some explanation for it. -- dann at godzilla dot ics dot uci dot edu changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|INVALID | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23523
[Bug target/23153] [4.1 Regression] [meta-bug] code size regression from 4.0 on x86
--- Comment #11 from dann at godzilla dot ics dot uci dot edu 2005-11-03 00:59 --- A very useful tool for comparing function sizes in 2 binaries/object file is: ftp://ftp.firstfloor.org/pub/ak/bloat-o-meter it displays the function names, the size, the size difference as absolute value and percentage. It would even be nice to have something like this in gcc/contrib. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23153
[Bug target/23153] [4.1 Regression] [meta-bug] code size regression from 4.0 on x86
--- Comment #10 from dann at godzilla dot ics dot uci dot edu 2005-11-03 00:53 --- (In reply to comment #9) > What are the flags for the sizes in comment #7 and comment #8? -O2 -march=i686 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23153
[Bug target/23303] [4.1 Regression] 4.1 generates sall + addl instead of leal
--- Comment #7 from dann at godzilla dot ics dot uci dot edu 2005-11-01 15:15 --- (In reply to comment #5) > Hmm, > I am still not sure if it matters too much, but since there are actually > dupes of this problem, I think we can simply add peep2 fixing this > particular common case. > > I am testing attached patch. Could you please try to measure the code size impact of this patch? (like the examples in PR23153: xterm, PR8361 or kernel) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23303
[Bug target/23524] [4.1 Regression]bigger version of mov + cmp produced
--- Comment #13 from dann at godzilla dot ics dot uci dot edu 2005-10-31 04:50 --- (In reply to comment #12) > A more interesting test would be to see the Linux kernel size difference, There's such a comparison now in comment #8 in PR23153. It confirms the size increase. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23524
[Bug target/23153] [4.1 Regression] [meta-bug] code size regression from 4.0 on x86
--- Comment #8 from dann at godzilla dot ics dot uci dot edu 2005-10-31 04:15 --- More data, the Linux kernel compiled for i686: size -f * textdata bss dec hex filename 2625471 534012 611768 3771251 398b73 vmlinux.4.0 3023306 429364 347384 3800054 39fbf6 vmlinux.4.1 It would be good if someone else can try to reproduce this. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23153
[Bug target/23524] [4.1 Regression]bigger version of mov + cmp produced
--- Comment #12 from dann at godzilla dot ics dot uci dot edu 2005-10-27 18:08 --- (In reply to comment #9) > And CSiBE tells you the story that GCC 4.1 produces smaller code overall. > http://www.inf.u-szeged.hu/csibe/draw-diag.php?draw=sum-os&basephp=s-i686-linux Well, it obviously depends on applications. The point of PR23153 is to show that there is a code size regression, and all the PRs that depend on it are showing very specific issues that cause a part of the regression. A more interesting test would be to see the Linux kernel size difference, because if there's any difference there would be some people screaming (unfortunately I won't be able to do that comparison anytime soon, hope someone else will). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23524
[Bug target/23524] [4.1 Regression]bigger version of mov + cmp produced
--- Comment #8 from dann at godzilla dot ics dot uci dot edu 2005-10-27 16:43 --- (In reply to comment #7) > Could the dear reported at least try to provide a small test case? The testcase in the attachment contains only a 4 lines function: HandleDeIconify, the rest is just fluff to allow it to compile. Granted a lot of it can be pruned, but I don't think it stops trying to debug the problem. > I think this should not be marked as a regression. Why not? It is a regression. > It's just sad that this > kind of non-bug keeps the regression count high, when in reality GCC 4.1 > produces smaller code overall. PR23153 tells a completely different story about codesize (at least for i686). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23524
[Bug rtl-optimization/24209] strange instruction selected for an annuled slot on sparc
--- Comment #1 from dann at godzilla dot ics dot uci dot edu 2005-10-05 05:13 --- Created an attachment (id=9889) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=9889&action=view) preprocessed code for this bug -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24209
[Bug rtl-optimization/24209] New: strange instruction selected for an annuled slot on sparc
4.1 selects a strange instruction to put in the delay slot of a bl,a instruction because in the non-taken case the same instruction will be executed anyway... -O2 code for 4.1 PointToRowCol: save%sp, -112, %sp sethi %hi(term), %g1 ld [%g1+%lo(term)], %l2 add %l2, 136, %l1 ld [%l1+572], %l3 ld [%l1+772], %l0 sub %i0, %l3, %o0 call.div, 0 ld [%l0+20], %o1 sethi %hi(firstValidRow), %g1 ld [%g1+%lo(firstValidRow)], %i0 cmp %o0, %i0 bl,a.LL118 ldsb [%l2+1823], %g1 ;; this instruction sethi %hi(lastValidRow), %g1 ld [%g1+%lo(lastValidRow)], %g1 cmp %o0, %g1 bg .LL116 mov%o0, %i0 .LL105: ldsb[%l2+1823], %g1 ;; this will be executed on the ;; non-taken path .LL118: cmp %g1, 0 bne .LL110 mov0, %o0 ld [%l0+32], %o0 .LL110: add %o0, %l3, %o0 ld [%l0+16], %o1 call.div, 0 sub%i1, %o0, %o0 cmp %o0, 0 bl .LL113 mov0, %g2 ld [%l1+888], %g1 add %g1, 1, %g1 cmp %o0, %g1 bg .LL117 mov%o0, %g2 .LL113: st %i0, [%i2] st %g2, [%i3] jmp %i7+8 restore .LL117: st %i0, [%i2] mov %g1, %g2 st %g2, [%i3] jmp %i7+8 restore .LL116: b .LL105 mov%g1, %i0 The 4.0 code is: PointToRowCol: save%sp, -112, %sp sethi %hi(term), %g1 ld [%g1+%lo(term)], %l2 add %l2, 136, %l1 ld [%l1+572], %l3 sub %i0, %l3, %o0 ld [%l1+772], %i0 call.div, 0 ld [%i0+20], %o1 sethi %hi(firstValidRow), %g1 ld [%g1+%lo(firstValidRow)], %g1 cmp %o0, %g1 bl .LL42 mov%o0, %l0 sethi %hi(lastValidRow), %g1 ld [%g1+%lo(lastValidRow)], %g1 cmp %o0, %g1 bg,a.LL32 mov%g1, %l0 .LL32: ldsb[%l2+1823], %g1 cmp %g1, 0 bne .LL36 mov0, %o0 ld [%i0+32], %o0 .LL36: add %o0, %l3, %o0 ld [%i0+16], %o1 call.div, 0 sub%i1, %o0, %o0 cmp %o0, 0 bl,a.LL43 st %l0, [%i2] ld [%l1+888], %g1 add %g1, 1, %g1 cmp %o0, %g1 bg,a.LL39 mov%g1, %o0 .LL39: st %l0, [%i2] st %o0, [%i3] jmp %i7+8 restore .LL42: b .LL32 mov%g1, %l0 .LL43: mov 0, %o0 st %o0, [%i3] jmp %i7+8 (the 4.0 code a few bytes smaller) I'll attach the preprocessed code. -- Summary: strange instruction selected for an annuled slot on sparc Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu GCC target triplet: sparc-sun-solaris2.8 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24209
[Bug c/24068] Unconditional warning when using -combine
--- Additional Comments From dann at godzilla dot ics dot uci dot edu 2005-09-29 20:10 --- (In reply to comment #9) > Subject: Re: Unconditional warning when using -combine > > On Mon, Sep 26, 2005 at 08:46:20PM -0000, dann at godzilla dot ics dot uci dot edu wrote: > > > So this about the following: > > > int f(a) > > > int a; > > > { > > > return a; > > > } > > > int f(int); > > > > > > Which is questionable. > > > > > > So I don't think this is not an inappropriate warning. > > > > It seems that the warning was designed for code like your example above. > > But if you have 1 K&R file and one C90 file, then there should be no > > warning... > > Another bad thing is that if you swap the files on the command line then > > you get > > no warning. > > There certainly should be a warning. It's not obvious on most targets > with int, but what you're doing here won't work with float arguments; > if the prototype includes an argument list, the definition should also. > Sorry, I am not sure I understand what are you referring to... Both in the original bug report and in Andrew's example above both the definition and the prototype included an argument list with types for all the declarations. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24068
[Bug target/23302] [4.1 Regression] extra move generated on x86
--- Additional Comments From dann at godzilla dot ics dot uci dot edu 2005-09-28 17:29 --- (In reply to comment #2) > While it might be probably possible to design peephole or combiner insn patter > I am tempted to close this and PR 23303 as WONTFIX as it seems to me we was > optimizing this by pure luck and the patch seems to have overall positive > effect > on code size... IMHO closing these bugs as WONTFIX is not the right thing to do. This is clearly a missed optimization opportunity. The fact that it worked by chance before your (overall good) patch does not make fixing this issue less desirable. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23302
[Bug c/24068] Unconditional warning when using -combine
--- Additional Comments From dann at godzilla dot ics dot uci dot edu 2005-09-26 20:46 --- (In reply to comment #4) > So this about the following: > int f(a) > int a; > { > return a; > } > int f(int); > > Which is questionable. > > So I don't think this is not an inappropriate warning. It seems that the warning was designed for code like your example above. But if you have 1 K&R file and one C90 file, then there should be no warning... Another bad thing is that if you swap the files on the command line then you get no warning. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24068
[Bug c/24068] Unconditional warning when using -combine
--- Additional Comments From dann at godzilla dot ics dot uci dot edu 2005-09-26 19:54 --- (In reply to comment #4) > Because one file uses K&R style function defintions and the other uses a prototype which is ANSI/ISO > style. > Simple example: [snip] > So I don't think this is not an inappropriate warning. The question is: can this EVER result in incorrect behavior? Is it incorrect from the standard point of view? If the answer to the above is no, then there no reason to warn. > > As an aside, I wish people would stop using K&R style C already. Aggreed. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24068
[Bug c/24068] Unconditional warning when using -fwhole-program
--- Additional Comments From dann at godzilla dot ics dot uci dot edu 2005-09-26 19:25 --- Created an attachment (id=9808) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=9808&action=view) xlwmenu.i -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24068
[Bug c/24068] Unconditional warning when using -fwhole-program
--- Additional Comments From dann at godzilla dot ics dot uci dot edu 2005-09-26 19:25 --- Created an attachment (id=9807) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=9807&action=view) xterm.i -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24068
[Bug c/24068] New: Unconditional warning when using -fwhole-program
When trying to compile the attached preprocessed files using gcc -c -fwhole-program --combine xterm.i xlwmenu.i These warnings are produced unconditionally: /home/dann/build/Emacs-CVS/emacs/lwlib/xlwmenu.c:57: warning: prototype for 'x_alloc_nearest_color_for_widget' follows non-prototype definition /home/dann/build/Emacs-CVS/emacs/lwlib/xlwmenu.c:58: warning: prototype for 'x_alloc_lighter_color_for_widget' follows non-prototype definition /home/dann/build/Emacs-CVS/emacs/lwlib/xlwmenu.c:64: warning: prototype for 'x_clear_errors' follows non-prototype definition /home/dann/build/Emacs-CVS/emacs/lwlib/xlwmenu.c:65: warning: prototype for 'x_copy_dpy_color' follows non-prototype definition AFAICT the warnings don't make much sense. The code is correct. The functions in questions are defined in one file and then prototyped and used in the other file. This kind of stuff appears in countless C programs. Can this warning be turned off by default? -- Summary: Unconditional warning when using -fwhole-program Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu CC: gcc-bugs at gcc dot gnu dot org GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24068
[Bug target/23828] local calling convention not used when using --combine
--- Additional Comments From dann at godzilla dot ics dot uci dot edu 2005-09-21 17:43 --- (In reply to comment #8) > (In reply to comment #4) > Instead of the above check, change it to: > if (local_regparm == 3 && DECL_STRUCT_FUNCTION (fn)->static_chain_decl) > local_regparm = 2; DECL_STRUCT_FUNCTION does not work, it ICEs when running the testsuite... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23828
[Bug target/23153] [4.1 Regression] [meta-bug] code size regression from 4.0 on x86
--- Additional Comments From dann at godzilla dot ics dot uci dot edu 2005-09-13 23:09 --- Additional data: For the testcase in PR8361: size -f generate-3.4*.o textdata bss dec hex filename 297025 4 181 297210 488fa generate-3.4-4.0.o 318366 8 181 318555 4dc5b generate-3.4-4.1.o so about a 7% increase for 4.1 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23153
[Bug middle-end/23828] local calling convention not used when using -fwhole-program --combine
--- Additional Comments From dann at godzilla dot ics dot uci dot edu 2005-09-13 22:57 --- (In reply to comment #6) > Maybe a better check would be check in the decl's function struct's > field > static_chain_decl is set. I am not sure I understand what you mean here... Maybe adding a test like this TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL) should work. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23828
[Bug middle-end/23828] local calling convention not used when using -fwhole-program --combine
--- Additional Comments From dann at godzilla dot ics dot uci dot edu 2005-09-13 22:36 --- It looks like the -fwhole-program version of ClearLeft only passes the first 2 arguments to the ClearInLine call in register, the 3rd one is passed on the stack. The reason for that is this code in i386.c:ix86_function_regparm: /* We can't use regparm(3) for nested functions as these use static chain pointer in third argument. */ if (local_regparm == 3 && DECL_CONTEXT (decl) && !DECL_NO_STATIC_CHAIN (decl)) local_regparm = 2; The test for nested functions is incorrect, in the -fwhole-program case DECL_CONTEXT (DECL_for_ClearLeft) is a TRANSLATION_UNIT_DECL so the test is true even though it should not be. Changing the code to: if (local_regparm == 3 && DECL_CONTEXT (decl) && (TREE_CODE (DECL_CONTEXT (decl)) != TRANSLATION_UNIT_DECL) && !DECL_NO_STATIC_CHAIN (decl)) local_regparm = 2; fixes the testcase. But the above just fixes the symptoms, it's probably not the correct way to test for a nested function. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23828
[Bug c/23872] .t02.original dump weirdness
--- Additional Comments From dann at godzilla dot ics dot uci dot edu 2005-09-13 20:45 --- The fact that the dump is different depending on function order or compilation flags seems to point to either an uninitialized variable or some memory corruption. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23872
[Bug middle-end/23872] New: .t02.original dump weirdness
Using gcc -O2 -fdump-tree-all -S to compile: int bar (void) { return 0;} int foo (int reject) { int result = 0; return result;} the .t02.original dump looks like: ;; Function bar (bar) ;; enabled by -tree-original { return 0; } ;; Function foo (foo) ;; enabled by -tree-original { int result = 0; int result = 0; <--- this line appears twice... return result; } If the order of the 2 functions is reversed in the file then the dump looks like: ;; Function foo (foo) ;; enabled by -tree-original { int result = 0; <--- the return does not appear... } ;; Function bar (bar) ;; enabled by -tree-original { return 0; } Using just -fdump-tree-original then the dump for "foo" always looks like the second version. -- Summary: .t02.original dump weirdness Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu CC: gcc-bugs at gcc dot gnu dot org GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23872
[Bug middle-end/23828] local calling convention not used when using -fwhole-program --combine
--- Additional Comments From dann at godzilla dot ics dot uci dot edu 2005-09-12 23:30 --- (In reply to comment #1) > If it changes calling-conventions > in single-file compile mode the function must be declared static, so it > definitely may be changed in whole-program mode, too. Yep, both ClearLeft and ClearInLine are declared static. It's interesting that both ClearLeft and ClearInLine appear on the "Marking local functions:" line in the i00.cgraph dump. Can you confirm this bug? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23828
[Bug rtl-optimization/23828] New: local calling convention not used when using -fwhole-program --combine
When compiling the files in the attachment for PR22574 using the command line: gcc -fwhole-program --combine -march=i686 -O2 button.i charproc.i charsets.i cursor.i data.i doublechr.i fontutils.i input.i main.i menu.i misc.i print.i ptydata.i screen.i scrollbar.i tabs.i util.i xstrings.i VTPrsTbl.i -S the function ClearLeft looks like: ClearLeft.221553: pushl %ebp movl%esp, %ebp subl$8, %esp movl3748(%eax), %ecx movl3752(%eax), %edx movl$0, (%esp); <- 0 is passed on the stack incl%ecx movl%ecx, 4(%esp) callClearInLine.221545 leave ret When compiling just the file util.i that contains ClearLeft using -march=i686 -O2 the assembly is: ClearLeft: pushl %ebp movl%esp, %ebp subl$8, %esp movl3748(%eax), %ecx movl3752(%eax), %edx incl%ecx movl%ecx, (%esp) xorl%ecx, %ecx ; 0 is passed in a register callClearInLine leave ret When using -fwhole-program --combine the parameter "0" to the ClearInLine function is passed on the stack instead of being passed in a register. Is there a reason for that? Shouldn't it be more better to pass it in a register? -- Summary: local calling convention not used when using -fwhole- program --combine Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu CC: gcc-bugs at gcc dot gnu dot org GCC target triplet: i686-pc-linux-gnus http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23828
[Bug rtl-optimization/23524] bigger version of mov + cmp produced
--- Additional Comments From dann at godzilla dot ics dot uci dot edu 2005-09-07 22:05 --- It seems that expand generates different insns in 4.0 and 4.1 for the comparison in question: 4.0 generates: (from .00.expand) (insn 15 13 16 1 (set (reg/f:SI 62) (mem/s/f:SI (plus:SI (reg/v/f:SI 58 [ gw ]) (const_int 4 [0x4])) [5 .core.widget_class+0 S4 A32])) -1 (nil) (nil)) (insn 16 15 17 1 (set (reg/f:SI 63) (mem/f/i:SI (symbol_ref:SI ("xtermWidgetClass") [flags 0x40] ) [5 xtermWidgetClass+0 S4 A32])) -1 (nil) (nil)) (insn 17 16 18 1 (set (reg:CCZ 17 flags) (compare:CCZ (reg/f:SI 62) (reg/f:SI 63))) -1 (nil) (nil)) 4.1 generates: (insn 15 13 16 1 (set (reg:SI 62) (mem/s/f:SI (plus:SI (reg/v/f:SI 58 [ gw ]) (const_int 4 [0x4])) [5 .core.widget_class+0 S4 A32])) -1 (nil) (nil)) (insn 16 15 17 1 (set (reg:CCZ 17 flags) (compare:CCZ (reg:SI 62) (mem/f/i:SI (symbol_ref:SI ("xtermWidgetClass") [flags 0x40] ) [5 xtermWidgetClass+0 S4 A32]))) -1 (nil) (nil)) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23524
[Bug rtl-optimization/23523] code size regression on x86
--- Additional Comments From dann at godzilla dot ics dot uci dot edu 2005-08-25 05:43 --- The issue is the peephole2 pass in 4.1. Before it the insn looks like: (insn:HI 36 34 37 0 (set (mem/i:SI (symbol_ref:SI ("waiting_for_initial_map") [flags 0x40] ) [7 waiting_for_initial_map+0 S4 A32]) (const_int 0 [0x0])) 34 {*movsi_1} (nil) (nil)) and after: (insn 58 34 59 0 (parallel [ (set (reg:SI 2 cx) (const_int 0 [0x0])) (clobber (reg:CC 17 flags)) ]) -1 (nil) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) (insn 59 58 37 0 (set (mem/i:SI (symbol_ref:SI ("waiting_for_initial_map") [flags 0x40] ) [7 waiting_for_initial_map+0 S4 A32]) (reg:SI 2 cx)) -1 (nil) (expr_list:REG_DEAD (reg:SI 2 cx) (nil))) 4.0 uses "ax" instead of "cx". %eax is free at that point, so it is strange that it's not used in 4.1 -- What|Removed |Added BugsThisDependsOn|18427 | OtherBugsDependingO||23153 nThis|| Keywords|ra | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23523
[Bug rtl-optimization/22563] [3.4/4.0/4.1 Regression] performance regression for gcc newer than 2.95
--- Additional Comments From dann at godzilla dot ics dot uci dot edu 2005-08-25 02:49 --- This message: http://gcc.gnu.org/ml/gcc/2005-08/msg00208.html was asking for the reason for the slowdown for S05e AFAICT the inner loop for the benchmark (in s05e_test) gets compiled to: .L153: fstl(%edx) leal8(%edx), %eax fstl(%eax) fstl8(%eax) fstl16(%eax) fstl24(%eax) fstl32(%eax) fstl40(%eax) fstl48(%eax) leal56(%eax), %edx cmpl%edx, %ecx jne .L153 and to: .L9: movl$0, (%edx) movl$1074266112, 4(%edx) movl$0, 8(%edx) movl$1074266112, 12(%edx) movl$0, 16(%edx) movl$1074266112, 20(%edx) movl$0, 24(%edx) movl$1074266112, 28(%edx) movl$0, 32(%edx) movl$1074266112, 36(%edx) movl$0, 40(%edx) movl$1074266112, 44(%edx) movl$0, 48(%edx) movl$1074266112, 52(%edx) movl$0, 56(%edx) movl$1074266112, 60(%edx) addl$64, %edx cmpl%edx, %ebx jne .L9 by 4.1 The 4.1 code looks much worse... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22563
[Bug rtl-optimization/23524] bigger version of mov + cmp produced
--- Additional Comments From dann at godzilla dot ics dot uci dot edu 2005-08-23 18:15 --- (In reply to comment #4) > > Then use -Os every where instead. You will see that the overall code > size for 4.1 > has reduced from 4.0. That might be true, but -Os is not always an option. If there's a good reason for -O2 to generate bigger code, then so be it, but that does not seem to be the case for the code in this PR (at least AFAICT). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23524
[Bug rtl-optimization/23524] bigger version of mov + cmp produced
--- Additional Comments From dann at godzilla dot ics dot uci dot edu 2005-08-23 18:05 --- (In reply to comment #2) > You really should know that we only care about code size at -Os. We care about performance > regressions though at -O2. Code size is important for performance for modern processors. Small I-cache (and I-TLB) footprint for otherwise equivalent code results in better performance. BTW, this is a 4.1 regression. -- What|Removed |Added OtherBugsDependingO||23153 nThis|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23524
[Bug target/23525] New: inefficient parameter passing on x86
Compiling this code: extern int waiting_for_initial_map; extern int close (int __fd); void first_map_occurred(void) { close(cp_pipe[0]); close(pc_pipe[1]); waiting_for_initial_map = 0; } using -O2 -march=i686 4.[01] generate sequences like: movlcp_pipe, %eax movl%eax, (%esp) for calling the close function the Intel compiler generates: pushl cp_pipe -- Summary: inefficient parameter passing on x86 Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: enhancement Priority: P2 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu CC: gcc-bugs at gcc dot gnu dot org GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23525