[Bug tree-optimization/17549] [4.0 Regression] 10% increase in codesize with C code compared to GCC 3.3
--- Additional Comments From steven at gcc dot gnu dot org 2005-02-10 22:59 --- . -- What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17549
[Bug tree-optimization/17549] [4.0 Regression] 10% increase in codesize with C code compared to GCC 3.3
--- Additional Comments From cvs-commit at gcc dot gnu dot org 2005-02-10 22:57 --- Subject: Bug 17549 CVSROOT:/cvs/gcc Module name:gcc Changes by: [EMAIL PROTECTED] 2005-02-10 22:57:31 Modified files: gcc: ChangeLog tree-outof-ssa.c Log message: PR tree-optimization/17549 * tree-outof-ssa.c (find_replaceable_in_bb): Do not allow TER to replace a DEF with its expression if the DEF and the rhs of the expression we replace into have the same root variable. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7438&r2=2.7439 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-outof-ssa.c.diff?cvsroot=gcc&r1=2.43&r2=2.44 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17549
[Bug tree-optimization/17549] [4.0 Regression] 10% increase in codesize with C code compared to GCC 3.3
--- Additional Comments From pinskia at gcc dot gnu dot org 2005-02-10 21:11 --- (In reply to comment #37) > So for ppc this bug is still not fixed even with my patch. Interesting data > point is the ppc32 size with -Os -fno-ivopts: >2820 0 02820 b04 no-ivopts.o > > So perhaps the pending IVopts patches will also help for this problem. I bet the problem here for PPC is the same as 18219 where we generate more than one IV. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17549
[Bug tree-optimization/17549] [4.0 Regression] 10% increase in codesize with C code compared to GCC 3.3
--- Additional Comments From steven at gcc dot gnu dot org 2005-02-10 10:06 --- 'size' for susan_edged_mod_1 .o files 33 = pre 3.3.3-suse (hammer branch 40 = CVS head 20050209 patched = CVS head 20050209 with the 'TER hack' patch applied. i686: textdata bss dec hex filename 2133 0 02133 855 33.o 3003 0 03003 bbb 40.o 2237 0 02237 8bd patched.o amd64: textdata bss dec hex filename 2710 0 02710 a96 33.o 3414 0 03414 d56 40.o 2421 0 02421 975 patched.o ppc32: textdata bss dec hex filename 2780 0 02780 adc 33.o 3348 0 03348 d14 40.o 3140 0 03140 c44 patched.o So for ppc this bug is still not fixed even with my patch. Interesting data point is the ppc32 size with -Os -fno-ivopts: 2820 0 02820 b04 no-ivopts.o So perhaps the pending IVopts patches will also help for this problem. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17549
[Bug tree-optimization/17549] [4.0 Regression] 10% increase in codesize with C code compared to GCC 3.3
--- Additional Comments From steven at gcc dot gnu dot org 2005-02-10 09:08 --- The slowdown is probably some unfortunate icache effect - ccould be anything from alignment, the slightly larger instructions due to using r8 instead of rcx. I guess we should not care too much about such random effects that we cannot do anything about anyway. I'm going to see if it doesn't hurt on i686, and submit the patch if things look good. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17549
[Bug tree-optimization/17549] [4.0 Regression] 10% increase in codesize with C code compared to GCC 3.3
--- Additional Comments From steven at gcc dot gnu dot org 2005-02-09 23:35 --- The entire diff of .optimized dumps and .s output for twolf on AMD64 is really small, in fact the asm output is different for only one file: config1.c.t65.optimized | 120 ++ configure.c.t65.optimized | 78 +++-- outpins.c.t65.optimized |6 +- outpins.s | 36 ++--- qsorte.c.t65.optimized|3 - qsortg.c.t65.optimized|3 - qsortgdx.c.t65.optimized |3 - qsortx.c.t65.optimized|3 - readcell.c.t65.optimized |3 - readseg.c.t65.optimized |6 +- ucgxp.c.t65.optimized |3 - uloop.c.t65.optimized |6 +- 12 files changed, 174 insertions(+), 96 deletions(-) The file with the assembler difference is outpins.c. The relevant diff is below. There is nothing in the diff that explains the ~4% slowdown I see in my SPEC benchmarks (3 runs, so the slowdown is consistent). The same instructions are there, just ordered differently and using different registers. So I'm not sure how to proceed... diff -u base/outpins.c.t65.optimized hacked/outpins.c.t65.optimized --- base/outpins.c.t65.optimized2005-02-10 00:19:20.950581229 +0100 +++ patched/outpins.c.t65.optimized 2005-02-10 00:16:19.436444879 +0100 @@ -99,8 +99,9 @@ pairArray.39 = pairArray; carray.40 = carray; D.3698 = *((struct cellbox * *) ((long unsigned int) *(*((int * *) D.3712 + pairArray.39 - 8B) + 4B) * 8) + carray.40); + end.81 = D.3698->cxcenter + (int) D.3698->tileptr->left; temp.59 = *(carray.40 + (struct cellbox * *) ((long unsigned int) *(*(pairArray.39 + (int * *) D.3712) + 4B) * 8)); - end = MAX_EXPR cxcenter + (int) D.3698->tileptr->left, temp.59->cxcenter + (int) temp.59->tileptr->left>; + end = MAX_EXPR cxcenter + (int) temp.59->tileptr->left>; :; return end; @@ -228,9 +229,10 @@ D.3668 = *((int * *) D.3664 + pairArray.36 - 8B); carray.37 = carray; D.3646 = *((struct cellbox * *) ((long unsigned int) *(D.3668 + (int *) ((long unsigned int) *D.3668 * 4)) * 8) + carray.37); + end.121 = D.3646->cxcenter + (int) D.3646->tileptr->right; D.3676 = *(pairArray.36 + (int * *) D.3664); temp.99 = *(carray.37 + (struct cellbox * *) ((long unsigned int) *(D.3676 + (int *) ((long unsigned int) *D.3676 * 4)) * 8)); - end = MIN_EXPR cxcenter + (int) D.3646->tileptr->right, temp.99->cxcenter + (int) temp.99->tileptr->right>; + end = MIN_EXPR cxcenter + (int) temp.99->tileptr->right>; :; return end; diff -u base/outpins.s hacked/outpins.s --- base/outpins.s 2005-02-10 00:19:21.064543028 +0100 +++ patched/outpins.s2005-02-10 00:16:19.551406289 +0100 @@ -18,18 +18,18 @@ movq-8(%rdx,%rcx), %rax movslq 4(%rax),%rax movq(%rsi,%rax,8), %rdi + movq40(%rdi), %rax + movswl (%rax),%r8d movq(%rcx,%rdx), %rax + addl12(%rdi), %r8d movslq 4(%rax),%rax movq(%rsi,%rax,8), %rdx - movq40(%rdi), %rax - movswl (%rax),%ecx movq40(%rdx), %rax - addl12(%rdi), %ecx movswl (%rax),%eax addl12(%rdx), %eax - cmpl%eax, %ecx - cmovl %eax, %ecx - movl%ecx, %eax + cmpl%eax, %r8d + cmovl %eax, %r8d + movl%r8d, %eax ret .p2align 4,,7 .L11: @@ -40,9 +40,9 @@ movqcarray(%rip), %rax movq(%rax,%rdx,8), %rdx movq40(%rdx), %rax - movswl (%rax),%ecx - addl12(%rdx), %ecx - movl%ecx, %eax + movswl (%rax),%r8d + addl12(%rdx), %r8d + movl%r8d, %eax ret .p2align 4,,7 .L12: @@ -72,18 +72,18 @@ movslq (%rcx),%rax movslq (%rcx,%rax,4),%rax movq(%rdi,%rax,8), %rcx + movq40(%rcx), %rax + movswl 2(%rax),%r8d movslq (%rdx),%rax + addl12(%rcx), %r8d movslq (%rdx,%rax,4),%rax movq(%rdi,%rax,8), %rdx - movq40(%rcx), %rax - movswl 2(%rax),%esi movq40(%rdx), %rax - addl12(%rcx), %esi movswl 2(%rax),%eax addl12(%rdx), %eax - cmpl%eax, %esi - cmovg %eax, %esi - movl%esi, %eax + cmpl%eax, %r8d + cmovg %eax, %r8d + movl%r8d, %eax ret .p2align 4,,7 .L22: @@ -95,9 +95,9 @@ movqcarray(%rip), %rax movq(%rax,%rdx,8), %rdx movq40(%rdx), %rax - movswl 2(%rax),%esi - addl12(%rdx), %esi - movl%esi, %eax + movswl 2(%rax),%r8d + addl12(%rdx), %r8d + movl%r8d, %eax ret .p2align 4,,7 .L23: -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17549
[Bug tree-optimization/17549] [4.0 Regression] 10% increase in codesize with C code compared to GCC 3.3
--- Additional Comments From steven at gcc dot gnu dot org 2005-02-09 22:00 --- My TER hack does fix most of the problems, but it also causes a significant regression in the SPEC twolf benchmark. All other benchmarks are roughly the same. I'll try to figure out what is causing the regression. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17549
[Bug tree-optimization/17549] [4.0 Regression] 10% increase in codesize with C code compared to GCC 3.3
--- Additional Comments From amacleod at redhat dot com 2005-02-08 14:26 --- (In reply to comment #28) > Using var_to_partition does not help. The reason is that the SSA names with > the same root var are not in the same partition, e.g. > > _7 --> > x_3 --> x > x_4 not coalesced with x --> New temp: 'x.0' > x_5 not coalesced with x.0 --> New temp: 'x.1' <...> > > Partition 0 (a - 1 ) > Partition 1 (b - 2 ) > Partition 2 (x - 3 ) > Partition 3 (x.0 - 4 ) > Partition 4 (x.1 - 5 ) > Partition 5 ( - 7 ) > > So if you replace the root var comparison in my hack with a check to make > sure > def and def2 are not in the same partition, that whole check will always be > false and you still get crap code. > of course. doh. Accumulation will result in live range splitting, so they will all be different variables. Stick with checking the root variable, its probably our simplest measure I guess. :-) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17549
[Bug tree-optimization/17549] [4.0 Regression] 10% increase in codesize with C code compared to GCC 3.3
--- Additional Comments From amacleod at redhat dot com 2005-02-08 14:02 --- (In reply to comment #30) > Subject: Re: [4.0 Regression] 10% increase in codesize with C code compared to GCC 3.3 > > On Mon, Feb 07, 2005 at 11:13:27PM -, steven at gcc dot gnu dot org wrote: > > x = a + b; > > x = x * a; > > x = x * b; > ... > > After Coalescing: > ... > > Partition 2 (x_3 - 3 ) > > Partition 3 (x_4 - 4 ) > > Partition 4 (x_5 - 5 ) > > That is curious. Certainly not the way I'd have expected things to work. > Why are we not coalescing here? Do we think that x_4 as an input to the > same insn that creates x_5 means that the two conflict? Unless someone > can convince me otherwise, I'd call this a bug. > No we dont think that, but they are disjoint live ranges, so we want to keep them seperate. we do live range splitting on the way out of SSA. there is no conflict, just reason NOT to coalesce them. if there was a copy between them, then we consider coalescing them. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17549
[Bug tree-optimization/17549] [4.0 Regression] 10% increase in codesize with C code compared to GCC 3.3
--- Additional Comments From steven at gcc dot gnu dot org 2005-02-08 00:15 --- Might as well make it mine while I'm looking at it. -- What|Removed |Added AssignedTo|unassigned at gcc dot gnu |steven at gcc dot gnu dot |dot org |org Status|NEW |ASSIGNED Last reconfirmed|2005-02-07 22:13:59 |2005-02-08 00:15:16 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17549
[Bug tree-optimization/17549] [4.0 Regression] 10% increase in codesize with C code compared to GCC 3.3
--- Additional Comments From rth at gcc dot gnu dot org 2005-02-07 23:36 --- Subject: Re: [4.0 Regression] 10% increase in codesize with C code compared to GCC 3.3 On Mon, Feb 07, 2005 at 11:13:27PM -, steven at gcc dot gnu dot org wrote: > x = a + b; > x = x * a; > x = x * b; ... > After Coalescing: ... > Partition 2 (x_3 - 3 ) > Partition 3 (x_4 - 4 ) > Partition 4 (x_5 - 5 ) That is curious. Certainly not the way I'd have expected things to work. Why are we not coalescing here? Do we think that x_4 as an input to the same insn that creates x_5 means that the two conflict? Unless someone can convince me otherwise, I'd call this a bug. r~ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17549
[Bug tree-optimization/17549] [4.0 Regression] 10% increase in codesize with C code compared to GCC 3.3
--- Additional Comments From steven at gcc dot gnu dot org 2005-02-07 23:16 --- Note the following: x_4 not coalesced with x --> New temp: 'x.0' x_5 not coalesced with x.0 --> New temp: 'x.1' Not very useful, because x_4 and x_5 have no uses left. So you start with this: foo (xD.1447, aD.1448, bD.1449) { intD.0 D.1452; # BLOCK 0 # PRED: ENTRY [100.0%] (fallthru,exec) xD.1447_3 = aD.1448_1 + bD.1449_2; xD.1447_4 = aD.1448_1 * xD.1447_3; xD.1447_5 = bD.1449_2 * xD.1447_4; return xD.1447_5; # SUCC: EXIT [100.0%] } and you end with this: foo (xD.1447, aD.1448, bD.1449) { intD.0 x.1D.1456; intD.0 x.0D.1455; intD.0 D.1452; # BLOCK 0 # PRED: ENTRY [100.0%] (fallthru,exec) return bD.1449 * aD.1448 * (aD.1448 + bD.1449); # SUCC: EXIT [100.0%] } Note the redundant temporaries. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17549
[Bug tree-optimization/17549] [4.0 Regression] 10% increase in codesize with C code compared to GCC 3.3
--- Additional Comments From steven at gcc dot gnu dot org 2005-02-07 23:13 --- Using var_to_partition does not help. The reason is that the SSA names with the same root var are not in the same partition, e.g. int foo (int x, int a, int b) { x = a + b; x = x * a; x = x * b; return x; } --> Sorted Coalesce list: Partition map Partition 0 (x_3 - 3 ) Partition 1 (x_4 - 4 ) Partition 2 (x_5 - 5 ) After Coalescing: Partition map Partition 0 (a_1 - 1 ) Partition 1 (b_2 - 2 ) Partition 2 (x_3 - 3 ) Partition 3 (x_4 - 4 ) Partition 4 (x_5 - 5 ) Partition 5 (_7 - 7 ) Replacing Expressions x_3 replace with --> a_1 + b_2 x_4 replace with --> a_1 * x_3 x_5 replace with --> b_2 * x_4 _7 --> x_3 --> x x_4 not coalesced with x --> New temp: 'x.0' x_5 not coalesced with x.0 --> New temp: 'x.1' b_2 --> b a_1 --> a After Root variable replacement: Partition map Partition 0 (a - 1 ) Partition 1 (b - 2 ) Partition 2 (x - 3 ) Partition 3 (x.0 - 4 ) Partition 4 (x.1 - 5 ) Partition 5 ( - 7 ) So if you replace the root var comparison in my hack with a check to make sure def and def2 are not in the same partition, that whole check will always be false and you still get crap code. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17549
[Bug tree-optimization/17549] [4.0 Regression] 10% increase in codesize with C code compared to GCC 3.3
--- Additional Comments From amacleod at redhat dot com 2005-02-03 16:05 --- (In reply to comment #26) > > Are we looking to do this at -O2 as well? I guess thats a key question. > > at just -Os, it might very well be sufficient. > > As stevenb noted today in IRC, the code reduction substantially comes from > less > spilling code being emitted, which also means that the generated code *will* > run faster. I guess it's better to have it at -O2 too (until the register > allocator gets fixed, but hey...) You missed my point. Sure, in the cases where we end up spilling less, its likely to run faster. But we are going to miss combine opportunites in other cases where we wouldnt have spilled in the first place. If we are going to do this at >=-O2, we're probably better off actually looking at how many live ranges we are introducing to solve the issue. If the big expression doesnt introduce too many live ranges, we might as well allow TER to continue and give the expanders a better view. Thats why TER exists in the first place. In any case, its all heuristical, so there will always be cases it doesnt work right for. If we measure SPEC or something else and the simple rootvar approach works fine, then go with it. If there are issues, then we can visit something a bit different. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17549
[Bug tree-optimization/17549] [4.0 Regression] 10% increase in codesize with C code compared to GCC 3.3
--- Additional Comments From giovannibajo at libero dot it 2005-02-03 15:15 --- > Are we looking to do this at -O2 as well? I guess thats a key question. > at just -Os, it might very well be sufficient. As stevenb noted today in IRC, the code reduction substantially comes from less spilling code being emitted, which also means that the generated code *will* run faster. I guess it's better to have it at -O2 too (until the register allocator gets fixed, but hey...) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17549
[Bug tree-optimization/17549] [4.0 Regression] 10% increase in codesize with C code compared to GCC 3.3
--- Additional Comments From amacleod at redhat dot com 2005-02-03 14:37 --- (In reply to comment #23) > We have incomming into out-of-ssa, > >x.1 = exp1 >x.2 = x.1 + exp2 >x.3 = x.2 + exp3 > > We're currently allowing TER to produce > >x.3 = exp1 + exp2 + exp3 > > What if we were to disable TER substitution when the base variable on the lhs > matches the base variable on the rhs? So in this case we'd notice x.1 and x.2 > have the same base variable and not merge. And (more importantly) so forth so > that the definitions of x.20 and x.19 aren't merged either. of course, the real issue isnt just the root variable is it? We could have TER introduce an expression with 1000 different base variables in the expression, and the problem is that we've got 1000 live now instead of something a little more sane by accumulating along the way. IN the above example, it depends on what is in exp1, exp2 and exp3 as to whether you want to avoid the subsitution. if all 3 exp's end up equating to y.6 + y.6, you probably DO want to let it happen so you end up with x.3 = y.6 + y.6 + y.6 + y.6 + y.6 + y.6 > This can probably still fall down, especially when a lot of our variables get > replaced by "ivtmp.x" and "pretmp.y", but at least we'll have some cut off > that > handles accumulation naturally. Perhaps there's some loser notion of "base > variable" we can use, like "in the same partition", or something. The partitions have been decided by the time TER runs, so we know what SSA_NAME is going to be a different variable and what isn't. all you need to do is compare var_to_partition(ssa_name) to see if they are in the same partition number. I suspect that anything easy we can do will have performance slow downs if we do it at >=-O2, but you never know :-). I suspect the better solution is to limit the number of live ranges it can introduce to something sane. I dont think that is likely to be too hard to code, the question is how quickly can it run :-) At every expression replacement point, we'd have to make a quick run through the operands and count unique partitions. Are we looking to do this at -O2 as well? I guess thats a key question. at just -Os, it might very well be sufficient. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17549
[Bug tree-optimization/17549] [4.0 Regression] 10% increase in codesize with C code compared to GCC 3.3
--- Additional Comments From rth at gcc dot gnu dot org 2005-02-03 06:37 --- We have incomming into out-of-ssa, x.1 = exp1 x.2 = x.1 + exp2 x.3 = x.2 + exp3 We're currently allowing TER to produce x.3 = exp1 + exp2 + exp3 What if we were to disable TER substitution when the base variable on the lhs matches the base variable on the rhs? So in this case we'd notice x.1 and x.2 have the same base variable and not merge. And (more importantly) so forth so that the definitions of x.20 and x.19 aren't merged either. This can probably still fall down, especially when a lot of our variables get replaced by "ivtmp.x" and "pretmp.y", but at least we'll have some cut off that handles accumulation naturally. Perhaps there's some loser notion of "base variable" we can use, like "in the same partition", or something. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17549
[Bug tree-optimization/17549] [4.0 Regression] 10% increase in codesize with C code compared to GCC 3.3
--- Additional Comments From amacleod at redhat dot com 2005-02-03 03:27 --- (In reply to comment #20) > (In reply to comment #19) > > The balance of the problem appears to come from TER. We're building very > > large > > constructs, such as > > Hmm, shouldn't the register allocator fix this anyways (yes I know we have a semi-stupid one but still). someday it will, but since it doesnt now we will have to fix it in TER. TER is a temporary measure and will cease to exist someday too. All we need is smarter expanders. Its all on a list somewhere :-) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17549
[Bug tree-optimization/17549] [4.0 Regression] 10% increase in codesize with C code compared to GCC 3.3
--- Additional Comments From amacleod at redhat dot com 2005-02-03 03:23 --- (In reply to comment #19) > The balance of the problem appears to come from TER. We're building very > large > constructs, such as ... > I'm not sure how best to attack this, Andrew. I think it's clear that turning > off TER entirely isn't the correct option, but I'm not sure how to throttle it > to keep things from running completely off the rails. Thoughts? thats a good question. Its not like TER is simple to throttle. I'll have a look at it tomorrow. we might be able to do something about counting the number of temporaries we substitute into a stmt, and cease and desist if we are going to increase the stmt size to more than some number, like 10 or 8 or 12 or 20 or NUMREGS/4 or somesuch thing. we might also only want to do it for MODIFY_EXPRs. I'd have to accumulate it per expression, but it shouldnt be too bad. I'll see if anything else occurs to me overnight or tomorrow. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17549
[Bug tree-optimization/17549] [4.0 Regression] 10% increase in codesize with C code compared to GCC 3.3
--- Additional Comments From pinskia at gcc dot gnu dot org 2005-02-03 02:38 --- (In reply to comment #19) > The balance of the problem appears to come from TER. We're building very > large > constructs, such as Hmm, shouldn't the register allocator fix this anyways (yes I know we have a semi-stupid one but still). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17549
[Bug tree-optimization/17549] [4.0 Regression] 10% increase in codesize with C code compared to GCC 3.3
--- Additional Comments From rth at gcc dot gnu dot org 2005-02-03 02:23 --- The balance of the problem appears to come from TER. We're building very large constructs, such as [z.c : 76] x.182 = -temp.218 + temp.220 - temp.688 - temp.222 + temp.224 + temp.692 - temp.226 * 3 - temp.227 * 2 - temp.228 + temp.230 + temp.231 * 2 + temp.232 * 3 - (int) *(cp - (uchar *) (unsigned int) *(temp.677 - 6B)) * 3 - (int) *(cp - (uchar *) (unsigned int) *(temp.677 + -5B)) * 2 - (int) *(cp - (uchar *) (unsigned int) *(temp.677 + -4B)) + (int) *(cp - (uchar *) (unsigned int) *(temp.677 + -2B)) + (int) *(cp - (uchar *) (unsigned int) *(temp.677 + -1B)) * 2 + (int) *(cp - (uchar *) (unsigned int) *temp.677) * 3 - temp.239 * 3 - temp.240 * 2 - temp.241 + temp.243 + temp.244 * 2 + temp.245 * 3 - temp.699 - temp.247 + temp.249 + temp.703 - temp.251 + temp.253; Instead of simply accumulating into X at each step, we're extending the lifetime of all of the temporaries until the end. Ouch. Anyway, if I use -Os -fno-tree-ter, the stack frame size drops to 40 bytes from 256. And object size is much nicer too: textdata bss dec hex filename 2136 0 02136 858 a.out-34 2006 0 02006 7d6 a.out-noter 2809 0 02809 af9 a.out-ter I'm not sure how best to attack this, Andrew. I think it's clear that turning off TER entirely isn't the correct option, but I'm not sure how to throttle it to keep things from running completely off the rails. Thoughts? -- What|Removed |Added CC||amacleod at redhat dot com Component|middle-end |tree-optimization Last reconfirmed|2004-09-18 17:05:39 |2005-02-03 02:23:45 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17549