[Bug tree-optimization/33604] [4.3 Regression] Revision 119502 causes significantly slower results with 4.3 compared to 4.2
--- Comment #32 from paolo dot bonzini at lu dot unisi dot ch 2007-12-04 12:38 --- Subject: Re: [4.3 Regression] Revision 119502 causes significantly slower results with 4.3 compared to 4.2 The difference between 4.2 and 4.3 is not as big but is still there: 0.7s vs. 1.6s Well, that's more than 2x. So, we could say that the penultimate testcase is good enough. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33604
[Bug rtl-optimization/34171] [4.3 Regression] Segfault in df_chain_remove_problem with -O3 on alpha
--- Comment #6 from paolo dot bonzini at lu dot unisi dot ch 2007-11-21 12:51 --- Subject: Re: [4.3 Regression] Segfault in df_chain_remove_problem with -O3 on alpha So it means the basic block has been deleted. I want to see what happens if I consolidate all the out_of_date_transfer_functions bitmaps into one. Seongbae, do you remember experimenting with anything like that? Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34171
[Bug target/34067] [4.3 regression] gfortran.dg/char_cshift_2.f90 fails with -O3 -funroll-loops fails on Intel Darwin
--- Comment #25 from paolo dot bonzini at lu dot unisi dot ch 2007-11-14 13:24 --- Subject: Re: [4.3 regression] gfortran.dg/char_cshift_2.f90 fails with -O3 -funroll-loops fails on Intel Darwin Yes, it does. Thanks a lot for the quick fix. Note that even if the patch is committed, the bug should stay open because as I said there is probably a latent bug somewhere. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34067
[Bug target/34067] [4.3 regression] gfortran.dg/char_cshift_2.f90 fails with -O3 -funroll-loops fails on Intel Darwin
--- Comment #15 from paolo dot bonzini at lu dot unisi dot ch 2007-11-13 11:43 --- Subject: Re: [4.3 regression] gfortran.dg/char_cshift_2.f90 fails with -O3 -funroll-loops fails on Intel Darwin bzip2 tar archive with four directories r42_O1, r42_O2, r43_O1, and r43_O2 containing the result of -fdump-tree-all and the assembly codes for the different revisions and options (see comment #2). Please use -fdump-rtl-all since fwprop is not a tree pass. :-) Also, please check if the bug appears disappears with -O2 -fno-forward-propagate -funroll-loops versus -O2 -funroll-loops in both 130042 and 130043. If the -fno-forward-propagate would make a difference, this would be a great way to narrow the bug. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34067
[Bug target/34067] [4.3 regression] gfortran.dg/char_cshift_2.f90 fails with -O3 -funroll-loops fails on Intel Darwin
--- Comment #19 from paolo dot bonzini at lu dot unisi dot ch 2007-11-13 16:46 --- Subject: Re: [4.3 regression] gfortran.dg/char_cshift_2.f90 fails with -O3 -funroll-loops fails on Intel Darwin [ibook-dhum] f90/bug% gfc -O1 -funroll-loops -fschedule-insns -fregmove -fexpensive-optimizations -fforward-propagate char_cshift_2_red_1.f90[ibook-dhum] f90/bug% a.out test 2 'adf' 'acf' 'adf' 1 1 2 1 'bdf' 'bcf' 'bdf' 2 1 2 1 'aef' 'adf' 'aef' 1 2 3 1 'bef' 'bdf' 'bef' 2 2 3 1 'acf' 'aef' 'acf' 1 3 1 1 'bcf' 'bef' 'bcf' 2 3 1 1 Abort [ibook-dhum] f90/bug% gfc -O1 -funroll-loops -fregmove -fexpensive-optimizations -fforward-propagate char_cshift_2_red_1.f90 [ibook-dhum] f90/bug% a.out test 2 'bdf' 'bcf' 'bdf' 2 1 2 1 'bef' 'bdf' 'bef' 2 2 3 1 'bcf' 'bef' 'bcf' 2 3 1 1 Abort Can you attach a -fdump-rtl-all tarball for these two sets of options for revision 130137? Thanks! Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34067
[Bug middle-end/33713] [4.3 Regression] can't find a register in class 'GENERAL_REGS' while reloading 'asm'
--- Comment #10 from paolo dot bonzini at lu dot unisi dot ch 2007-11-08 16:32 --- Subject: Re: [4.3 Regression] can't find a register in class 'GENERAL_REGS' while reloading 'asm' ubizjak at gmail dot com wrote: --- Comment #9 from ubizjak at gmail dot com 2007-11-07 18:56 --- (In reply to comment #8) So if we can agree to dump -fforce-addr: yes, please. Sometimes -fforce-addr produces faster code, as claimed in http://gcc.gnu.org/ml/fortran/2007-10/msg00048.html My SPEC2000 run has not finished, but the results so far give a different overall picture: +gzip: 156 - 167 (without -fforce-addr - with -fforce-addr) +vpr: 165 - 168 +gcc: 68 - 70 mcf: 167 - 167 crafty: 95 - 95 parser: 203 - 204 +perlbmk: 118 - 124 +gap: 74 - 77 -vortex: 142 - 136 +bzip2: 153 - 156 +twolf: 223 - 236 wupwise: 119 - 120 -swim: 180 - 175 mgrid: 249 - 251 applu: 193 - 194 +mesa: 179 - 182 +galgel: 145 - 149 art: 426 - 427 -equake: 86 - 79 +facerec: 190 - 193 Probably it would be better to spend time distilling a simple testcase from equake, which is a relatively small (1500 lines) C program, so that the benefit is there with regular -O2. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33713
[Bug rtl-optimization/34012] [4.3 Regression] Pessimization caused by fwprop
--- Comment #5 from paolo dot bonzini at lu dot unisi dot ch 2007-11-07 13:53 --- Subject: Re: [4.3 Regression] Pessimization caused by fwprop BTW, why don't you use just rtx_cost instead of insn_rtx_cost? In each case you have an insn, so you can do single_set on it and run rtx_cost (SET_SRC (set), SET) on it directly. You're right. I was just mimicking combine to see if the problem was simple to solve or more deeply rooted. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34012
[Bug rtl-optimization/34012] [4.3 Regression] Pessimization caused by fwprop
--- Comment #8 from paolo dot bonzini at lu dot unisi dot ch 2007-11-08 06:10 --- Subject: Re: [4.3 Regression] Pessimization caused by fwprop jakub at gcc dot gnu dot org wrote: --- Comment #7 from jakub at gcc dot gnu dot org 2007-11-07 20:18 --- Created an attachment (id=14502) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14502action=view) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14502action=view) gcc43-pr34012.patch Updated patch with testcase. Paolo, are you bootstrapping/regtesting this or should I? I can do that on x86_64-linux, ppc64-linux and ia64-linux overnight. I could have done it only on Monday and only for x86_64 (CPU time for the local x86_64 is reserved by another guy till then...) so yes, it's much better if you do it. Thanks! Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34012
[Bug tree-optimization/33604] [4.3 Regression] Revision 119502 causes significantly slower results with 4.3 compared to 4.2
--- Comment #22 from paolo dot bonzini at lu dot unisi dot ch 2007-11-08 06:12 --- Subject: Re: [4.3 Regression] Revision 119502 causes significantly slower results with 4.3 compared to 4.2 jacob at math dot jussieu dot fr wrote: --- Comment #21 from jacob at math dot jussieu dot fr 2007-11-07 20:58 --- Hi, I'm the guy behind Eigen, from which the initial testcase is taken. Would it help the compiler if I removed most of the const keywords in my code? I'm not the expert on expression templates, but I don't think so. const's are mostly ignored by the compiler for optimization purposes, but removing those would be bad for the users, I think. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33604
[Bug tree-optimization/33604] [4.3 Regression] Revision 119502 causes significantly slower results with 4.3 compared to 4.2
--- Comment #13 from paolo dot bonzini at lu dot unisi dot ch 2007-11-07 06:03 --- Subject: Re: [4.3 Regression] Revision 119502 causes significantly slower results with 4.3 compared to 4.2 I don't think we want to start playing with the heuristics ;) That patch certainly will cause compile-time and memory usage regressions. Not for -O2, since the default AVG_ALIASED_VOPS is 1. For -O3 it is 3, not huge either. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33604
[Bug rtl-optimization/33552] wrong code for multiple output asm, wrong df?
--- Comment #6 from paolo dot bonzini at lu dot unisi dot ch 2007-09-25 14:22 --- Subject: Re: wrong code for multiple output asm, wrong df? ubizjak at gmail dot com wrote: --- Comment #5 from ubizjak at gmail dot com 2007-09-25 13:58 --- (In reply to comment #1) #define add_ss(sh, sl, ah, al, bh, bl) \ __asm__ (addq %5,%q1\n\tadcq %3,%q0 \ : =r (sh), =r (sl) \ : 0 ((UDItype)(ah)), rme ((UDItype)(bh)), \ %1 ((UDItype)(al)), rme ((UDItype)(bl))) marking %0 early-clobbered fixes the problem. We also have longlong.h in gcc/ directory, where #define add_ss(sh, sl, ah, al, bh, bl) \ __asm__ (addq %5,%1\n\tadcq %3,%0 \ : =r ((UDItype) (sh)), \ =r ((UDItype) (sl)) \ : %0 ((UDItype) (ah)), \ rme ((UDItype) (bh)),\ %1 ((UDItype) (al)), \ rme ((UDItype) (bl))) I think that both version ought to work. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33552
[Bug rtl-optimization/33552] wrong code for multiple output asm, wrong df?
--- Comment #8 from paolo dot bonzini at lu dot unisi dot ch 2007-09-25 14:40 --- Subject: Re: wrong code for multiple output asm, wrong df? There is a comment in the '%' documentation: GCC can only handle one commutative pair in an asm; if you use more, the compiler may fail. Note that you need not use the Right, but then it is gmp (the subject of this PR) who's right and gcc who's wrong. You said the other way round. :-) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33552
[Bug middle-end/32758] [4.3 Regression] ecj1 hangs
--- Comment #23 from paolo dot bonzini at lu dot unisi dot ch 2007-08-29 11:47 --- Subject: Re: [4.3 Regression] ecj1 hangs df_simulate_one_insn_forwards and df_simulate_one_insn_backwards (why we have the former when nothing ever uses it?) both call df_simulate_fixup_sets to fix this up, shouldn't dce_process_block call that too? Yes, it was probably an oversight of this patch: 2007-05-21 Kenneth Zadeck [EMAIL PROTECTED] * dbgcnt.def: Fixed comment. * df-scan.c (df_get_regular_block_artificial_uses): Added frame pointer after reload if frame_pointer_needed. * df.h (df_simulate_defs, df_simulate_uses): Made public. * df-problems.c (df_simulate_defs, df_simulate_uses): Made public. * dce.c (deletable_insn_p): Only allow frame-related insns to be deleted if there is a REG_MAYBE_DEAD note. (dce_process_block): Now uses df_simulate_defs and df_simulate_uses. It should be placed outside the if, i.e. you should always execute it. You can CC me when you send the patch to the mailing list (but I'll be out of office from tomorrow to sunday). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32758
[Bug middle-end/32758] [4.3 Regression] ecj1 hangs
--- Comment #24 from paolo dot bonzini at lu dot unisi dot ch 2007-08-29 12:15 --- Subject: Re: [4.3 Regression] ecj1 hangs Here is what I will try to regtest (already verified it fixes the testcase). This is wrong, because local_live changes during execution of dce_process_block. The {eh,regular}_block_artificial_uses must always be set (they are live throughout, not just at the bottom of the basic block). Alternatively, artificial_live could be a pointer bitmap which would just point to one of the df-*_block_artificial_uses bitmap, though not sure if that ever could or in some register that wasn't originally set in local_live. Much better, and would have the same effect as dce_simulate_fixup_sets. Certainly calling df_has_eh_preds just once per bb rather than per insn is IMHO worthwhile. A patch to do so if preapproved if you add a comment saying /* Calling df_simulate_fixup_sets has the disadvantage of calling df_has_eh_preds once per insn, so we cache the information here. */ before the place where you set artificial_live. Thanks! -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32758
[Bug tree-optimization/33158] missed store sinking opportunity
--- Comment #2 from paolo dot bonzini at lu dot unisi dot ch 2007-08-24 14:53 --- Subject: Re: missed store sinking opportunity Danny said he knows how to fix it (I guess in store sinking though he didn't say). From knowing him, there might be additional less obvious cases that this fix might optimize, and that would not be optimized by phiopt. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33158
[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc
--- Comment #41 from paolo dot bonzini at lu dot unisi dot ch 2007-08-06 11:52 --- Subject: Re: [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc This is now more like a meta-bug, see the other two bugs which are opened for the current issues (yes both are assigned to me and both are actively being worked on, well one is depend on the other but still being worked on). Ah, I see. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690
[Bug middle-end/32004] [4.1/4.2/4.3 regression] : can't find a register in class 'GENERAL_REGS' while reloading 'asm'
--- Comment #36 from paolo dot bonzini at lu dot unisi dot ch 2007-07-13 09:57 --- Subject: Re: [4.1/4.2/4.3 regression] : can't find a register in class 'GENERAL_REGS' while reloading 'asm' kkojima at gcc dot gnu dot org wrote: --- Comment #33 from kkojima at gcc dot gnu dot org 2007-07-12 01:11 --- It seems that the patch 126418 causes an ICE for gcc.dg/asm-4.c and the patch 126487 breaks gcc.c-torture/compile/2804-1.c on 4.2 for SH. Both failures happen only with -O0. It looks ia64 testresults show similar errors: http://gcc.gnu.org/ml/gcc-testresults/2007-07/msg00309.html http://gcc.gnu.org/ml/gcc-testresults/2007-07/msg00446.html The former is easy and I have a patch. I don't understand the latter instead, it looks like (on sh at least) reload is not able to make a valid address and it might be a latent bug. Could anybody look at it? Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32004
[Bug middle-end/32004] [4.1/4.2/4.3 regression] : can't find a register in class 'GENERAL_REGS' while reloading 'asm'
--- Comment #34 from paolo dot bonzini at lu dot unisi dot ch 2007-07-12 19:01 --- Subject: Re: [4.1/4.2/4.3 regression] : can't find a register in class 'GENERAL_REGS' while reloading 'asm' kkojima at gcc dot gnu dot org wrote: --- Comment #33 from kkojima at gcc dot gnu dot org 2007-07-12 01:11 --- It seems that the patch 126418 causes an ICE for gcc.dg/asm-4.c and the patch 126487 breaks gcc.c-torture/compile/2804-1.c on 4.2 for SH. Both failures happen only with -O0. It looks ia64 testresults show similar errors: http://gcc.gnu.org/ml/gcc-testresults/2007-07/msg00309.html http://gcc.gnu.org/ml/gcc-testresults/2007-07/msg00446.html on 4.2. I'll take a look, and revert the patches on 4.1, tomorrow. Thanks, Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32004
[Bug rtl-optimization/28940] [4.0/4.1/4.2 Regression] address selection does not work correctly
--- Comment #15 from paolo dot bonzini at lu dot unisi dot ch 2007-07-05 10:46 --- Subject: Re: [4.0/4.1/4.2 Regression] address selection does not work correctly Yes, we should add a testcase. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28940
[Bug target/32437] [4.3 Regression] MIPS: FAIL in gcc.dg/cleanup-[8|9|10|11].c
--- Comment #9 from paolo dot bonzini at lu dot unisi dot ch 2007-06-23 16:03 --- Subject: Re: [4.3 Regression] MIPS: FAIL in gcc.dg/cleanup-[8|9|10|11].c Kenneth Zadeck wrote: This patch changes dce:deletable_insn_p so that it looks at all of the top level clauses in a parallel to make it's decision. It was not keeping insns that had a top level USE or UNSPEC if they were inside of parallels. This should fix pr32437 and perhaps other things. The patch has only been tested on ppc and x86-64. It is harmless on those platforms. It is likely to make a difference on pa-risc and mips where there are parallels that contain top level unspecs. Ok to commit? ok -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32437
[Bug tree-optimization/32390] tree-ssa-math-opts.c performs too many IL scans
--- Comment #4 from paolo dot bonzini at lu dot unisi dot ch 2007-06-19 05:09 --- Subject: Re: tree-ssa-math-opts.c performs too many IL scans We have reciprocal pass (in fact CSE recip pass) that CSEs 1.0/z from x/z, y/z, .../z. This is done by scanning function for RDIV_EXPR, where denominator (z) is the same. If 1.0/func() - rfunc() conversion is done before recip pass, we loose the ability to scan for RDIV_EXPRs and the ability to CSE the division. You could still use a pointer_set or pointer_map to save where the RDIV_EXPRs are and avoid scanning the IL twice. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32390
[Bug middle-end/32349] [4.3 Regression] ICE in df_refs_verify with -O2 -fmodulo-sched for spec tests
--- Comment #3 from paolo dot bonzini at lu dot unisi dot ch 2007-06-17 14:14 --- Subject: Re: [4.3 Regression] ICE in df_refs_verify with -O2 -fmodulo-sched for spec tests ok to commit? Yes. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32349
[Bug rtl-optimization/32355] [4.3 Regression] ICE in df_lr_verify_transfer_functions, at df-problems.c:1924
--- Comment #2 from paolo dot bonzini at lu dot unisi dot ch 2007-06-18 04:41 --- Subject: Re: [4.3 Regression] ICE in df_lr_verify_transfer_functions, at df-problems.c:1924 The possible second problem is that something in one of delete_trivially_dead_insns rebuild_jump_labels cleanup_cfg may not work in deferred rescanning mode. This will wait for another bug report. ok, but please file in this report yourself and assign it to me. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32355
[Bug middle-end/32004] [4.3 regression] : gcc.target/i386/pr21291.c
--- Comment #15 from paolo dot bonzini at lu dot unisi dot ch 2007-05-21 09:41 --- Subject: Re: [4.3 regression] : gcc.target/i386/pr21291.c matz at gcc dot gnu dot org wrote: --- Comment #14 from matz at gcc dot gnu dot org 2007-05-21 09:35 --- Yes. The place where I would think the work-around to become definitive is exactly before regclass. Between it and reload nothing interesting happens transformation wise. I was thinking of the same. Also, if the patch included adding a cfun-have_asm_statement flag set during RTL expansion, it would probably cost little to do it in an entirely new pass before regclass. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32004
[Bug debug/31412] [4.3] inf loop/long compile time, time spent in var-tracking.c
--- Comment #12 from paolo dot bonzini at lu dot unisi dot ch 2007-04-03 13:59 --- Subject: Re: [4.3] inf loop/long compile time, time spent in var-tracking.c With dataflow branch that was compiled with profiling the testcase finishes not too slow: This suggest that it is a bug in the dataflow computation, causing it to oscillate and not terminate. Different RTL just causes the bug not to trigger. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31412
[Bug target/29487] Shared libstdc++ fails to link
--- Comment #29 from paolo dot bonzini at lu dot unisi dot ch 2007-02-06 08:26 --- Subject: Re: Shared libstdc++ fails to link Paolo, would you be able to undo the change to make foo not marked TREE_NOTHROW? IIUC, that would be different than the patch you posted in Comment #22, which appears to affect bar. My patch was related to Richi's comment when CCing me; the only patch I found from me that was related to TREE_NOTHROW subtly changed the semantics, and that patch sort of undone that change (I say sort of because it takes a little more care actually). Also, I didn't quite understand your patch, in that it would appear to result in fewer functions being marked TREE_NOTHROW It would result in more functions being marked TREE_NOTHROW, since those functions will not go through the insn loop in set_nothrow_function_flags if the front-end declared them nothrow. I'm attaching an updated version of the patch, which passes some internal (i.e. my brain and a small testcase using weak and nothrow) sanity checks, but I've not tried in any real world situation. It's actually an alternate fix for PR29323 which doesn't trigger this bug. I'll let other people consider if it makes any sense since I'm not at all expert in this area. I would think we want to remove the check for binds_local_p at the top of set_nothrow_function_flags? Agreed, and we have to move it elsewhere. (See the upcoming patch). Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29487
[Bug testsuite/29404] make check fails to compile library testcases
--- Comment #7 from paolo dot bonzini at lu dot unisi dot ch 2007-02-01 06:13 --- Subject: Re: make check fails to compile library testcases I'll take a look. Any ideas? Sure, I'm just a little busy. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29404
[Bug bootstrap/30541] Top-level should pass GNATBIND, GNATLINK and GNATMAKE variables down
--- Comment #13 from paolo dot bonzini at lu dot unisi dot ch 2007-01-23 14:55 --- Subject: Re: Top-level should pass GNATBIND, GNATLINK and GNATMAKE variables down True, they seem to be unused, but it's better to be consistent; for the same reason I prefer to pass GNATLINK down too via flags_to_pass. Not only they seem, but they ARE unused. [...] So I see no reason to add GNATLINK here really. I meant they seem to be unused because we always rely on the GNATMAKE/GNATLINK passed by a higher-level makefile. It's good to write in the makefile the variables that you use -- even if they are always overridden. Now, gnatmake is also needed because of some support tools, but logically, these support tools should only be built once before bootstrap starts, in order to generate automatically files. Note that the exact same steps are followed in all the three stages. stage1 does not have *anything* magic in it. If gnatmake is used in stage1 to build support tools, the same tools will be built in stage2 (using gnatmake) unless these tools write in the srcdir. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30541
[Bug tree-optimization/17687] sincos tree representation causes extra addressable vars
--- Comment #16 from paolo dot bonzini at lu dot unisi dot ch 2006-12-06 09:58 --- Subject: Re: sincos tree representation causes extra addressable vars Paolo, are you working on this? No. :-( -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17687
[Bug rtl-optimization/29840] [4.3 Regression] build/genconditions ../../gcc/gcc/config/pa/pa.md tmp-condmd.c: /bin/sh: 13354 Memory fault(coredump)
--- Comment #27 from paolo dot bonzini at lu dot unisi dot ch 2006-12-02 09:27 --- Subject: Re: [4.3 Regression] build/genconditions ../../gcc/gcc/config/pa/pa.md tmp-condmd.c: /bin/sh: 13354 Memory fault(coredump) dave at hiauly1 dot hia dot nrc dot ca wrote: --- Comment #25 from dave at hiauly1 dot hia dot nrc dot ca 2006-12-01 22:22 --- Subject: Re: [4.3 Regression] build/genconditions ../../gcc/gcc/config/pa/pa.md tmp-condmd.c: /bin/sh: 13354 MRO DF_REF_INSN (def) is 0. It looks like the ICE can be avoided by a check on def_insn. The attached patch seems to get by the ICE. I'm pretty sure that's the same issue as the second and third hunk of http://gcc.gnu.org/ml/gcc-patches/2006-10/msg00891.html Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29840
[Bug rtl-optimization/29840] [4.3 Regression] build/genconditions ../../gcc/gcc/config/pa/pa.md tmp-condmd.c: /bin/sh: 13354 Memory fault(coredump)
--- Comment #29 from paolo dot bonzini at lu dot unisi dot ch 2006-12-02 17:38 --- Subject: Re: [4.3 Regression] build/genconditions ../../gcc/gcc/config/pa/pa.md tmp-condmd.c: /bin/sh: 13354 Memory fault(coredump) I'm pretty sure that's the same issue as the second and third hunk of http://gcc.gnu.org/ml/gcc-patches/2006-10/msg00891.html I see this was committed to the dataflow branch. What about the trunk? I will test on a cross if it fixes the failure on hppa. It seemed not to be necessary on the trunk (it fixed a very early bootstrapping failure on x86-linux, as soon as building stage1 libgcc). Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29840
[Bug rtl-optimization/29840] [4.3 Regression] build/genconditions ../../gcc/gcc/config/pa/pa.md tmp-condmd.c: /bin/sh: 13354 Memory fault(coredump)
--- Comment #23 from paolo dot bonzini at lu dot unisi dot ch 2006-11-30 19:18 --- Subject: Re: [4.3 Regression] build/genconditions ../../gcc/gcc/config/pa/pa.md tmp-condmd.c: /bin/sh: 13354 Memory fault(coredump) I had an unexpected eye operation Tuesday and the vision in my right eye is now half blocked by a gas bubble. The bubble is supposed to slowly decrease in size over a few weeks. Don't know when reasonable visiblity will return and when i'll be able to look at this. No problem. If the compiler is not being miscompiled, I will be able to look at it with a cross. Good luck! Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29840
[Bug rtl-optimization/29840] [4.3 Regression] build/genconditions ../../gcc/gcc/config/pa/pa.md tmp-condmd.c: /bin/sh: 13354 Memory fault(coredump)
--- Comment #17 from paolo dot bonzini at lu dot unisi dot ch 2006-11-26 09:05 --- Subject: Re: [4.3 Regression] build/genconditions ../../gcc/gcc/config/pa/pa.md tmp-condmd.c: /bin/sh: 13354 Memory fault(coredump) I wonder if it is enough to just add DF_HARD_REGS in the df_init call? Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29840
[Bug c/28940] [4.0/4.1/4.2 Regression] address selection does not work correctly
--- Comment #11 from paolo dot bonzini at lu dot unisi dot ch 2006-10-11 13:05 --- Subject: Re: [4.0/4.1/4.2 Regression] address selection does not work correctly movl8(%ebp), %edx addl$1, %edx movsbl b(%edx),%eax movsbl a(%edx),%edx No, the good behavior has b+1(%edx) and a+1(%edx) (for non-PIC code). Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28940
[Bug middle-end/28690] [4.2 Regression] Performace problem with indexed load/stores on powerpc
--- Comment #18 from paolo dot bonzini at lu dot unisi dot ch 2006-10-03 05:20 --- Subject: Re: [4.2 Regression] Performace problem with indexed load/stores on powerpc * rtlanal.c (swap_commutative_operands_p): Preference a REG_POINTER over a non REG_POINTER. * tree-ssa-address.c (gen_addr_rtx): Force a REG_POINTER to be the first operand of a PLUS. This is more gentle indeed. Be careful however as functions calling commutative_operand_precedence directly may have a problem with that. Can you try making an address illegitimate if it is non-REG_POINTER + REG_POINTER? Or set up splitters to do the transformation just before reload? Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690
[Bug java/28938] [ecj] update build instructions to account for changes
--- Comment #3 from paolo dot bonzini at lu dot unisi dot ch 2006-09-21 08:21 --- Subject: Re: [ecj] update build instructions to account for changes This is found using the normal gcc specs approach. In a distribution I'd expect ecj1 to end up in the gcc-lib dir. In my case I just have it on my PATH. We won't be including the ecj sources in the gcc tree. As I recall that was rejected by the SC. So it will always be a separate download. The best thing would be if I could just sudo apt-get install ecj. If there are any differences between ecj and ecj1, we should provide some kind of wrapper. A nice possibility, would be to support dropping the downloaded JAR somewhere in the tree where it installs correctly and automagically. This would not be against the SC decision. Both ecj and the new gcjh can be run on any vm, including all the free ones. I've built libgcj many times running these purely interpreted and it is not painfully slow. Cool, though not unexpected because the code generation of Java bytecodes is not that hard (apart from the unreachable and uninitialized code checking). Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28938
[Bug bootstrap/28770] one reference to powerpc-ibm-eabi-ar.exe when only xar.exe installed
--- Comment #7 from paolo dot bonzini at lu dot unisi dot ch 2006-08-18 14:08 --- Subject: Re: one reference to powerpc-ibm-eabi-ar.exe when only xar.exe installed etienne_lorrain at yahoo dot fr wrote: --- Comment #6 from etienne_lorrain at yahoo dot fr 2006-08-18 13:55 --- I do have $(HOME)/local/powerpc-ibm-eabi/bin/ar.exe and I am using $(HOME)/local/bin/xar.exe for my stuff here, after install. To bootstrap, GCC may better use $(HOME)/local/powerpc-ibm-eabi/bin/ar.exe but that will not be in the path, so GCC needs to call it with full path. For 4.2.0, it will find it and use it: if test x$host = x$build test -f $srcdir/gcc/BASE-VER; then ... gcc_cv_tool_dirs=$gcc_cv_tool_dirs$gcc_cv_tool_prefix/$target_noncanonical/bin$PATH_SEPARATOR else gcc_cv_tool_dirs= fi ... if test -z $ac_cv_path_$1 ; then AC_PATH_PROG([$1], [$2], [], [$gcc_cv_tool_dirs]) fi What 4.2.0 does is to use the same algorithm that the compiler will use to find the assembler/linker, and apply it for other tools such as ar. We decided that a configuration where this breaks is already broken too much. For 4.1.x it was somewhat a mess. What people were really doing in the wild was not yet clear, as it was by the time we finished cleaning up this stuff, so there were some unintended changes in the behavior. But the combined tree will surely work. I was thinking combined tree was not as good, mostly because I had to select which common part of the trees to keep - and well, I may have choosen the binutils ones. gcc should always win over binutils. That's by design. Changes to the toplevel are almost always driven by changes in gcc -- the binutils tree is mostly agnostic and just follows what gcc does. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28770
[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3
--- Comment #59 from paolo dot bonzini at lu dot unisi dot ch 2006-08-10 06:52 --- Subject: Re: [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3 Thanks for the response, but I believe you are conflating two issues (as is this flag, which is why this is bad news). Different answers to the question what is this sum does not ruin IEEE compliance. I am referring to IEEE 754, which is a standard set of rules for storage and arithmetic for floating point (fp) on modern hardware. You are also confusing -funsafe-math-optimizations with -ffast-math. The latter is a one catch all flag that compiles as if there were no FP traps, infinities, NaNs, and so on. The former instead enables unsafe optimizations but not catastrophic optimizations -- if you consider meaningless results on badly conditioned matrixes to not be catastrophic... A more or less complete list of things enabled by -funsafe-math-optimizations includes: Reassociation: - reassociation of operations, not only for the vectorizer's sake but also in the unroller (see around line 1600 of loop-unroll.c) - other simplifications like a/(b*c) for a/b/c - expansion of pow (a, b) to multiplications if b is integer Compile-time evaluation: - doing more aggressive compile-time evaluation of floating-point expressions (e.g. cabs) - less accurate modeling of overflow in compile-time expressions, for formats such as 106-bit mantissa long doubles Math identities: - expansion of cabs to sqrt (a*a + b*b) - simplifications involving trascendental functions, e.g. exp (0.5*x) for sqrt (exp (x)), or x for tan(atan(x)) - moving terms to the other side of a comparison, e.g. a 4 for a + 4 8, or x -1 for 1 - x 2 - assuming in-domain arguments of sqrt, log, etc., e.g. x for sqrt(x)*sqrt(x) - in turn, this enables removing math functions from comparisons, e.g. x 4 for sqrt (x) 2 Optimization: - strength reduction of a/b to a*(1/b), both as loop invariants and in code like vector normalization - eliminating recursion for accumulator-like functions, i.e. f (n) = n + f(n-1) Back-end operation: - using x87 builtins for transcendental functions There may be bugs, but in general these optimizations are safe for infinities and NaNs, but not for signed zeros or (as I said) for very badly conditioned data. I am unaware of their being any rules on compilation. Rules are determined by the language standards. I believe that C mandates no reassociation; Fortran allows reassociation unless explicit parentheses are present in the source, but this is not (yet) implemented by GCC. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827
[Bug tree-optimization/17687] sincos can be folded at the tree level
--- Comment #9 from paolo dot bonzini at lu dot unisi dot ch 2006-08-10 08:04 --- Subject: Re: sincos can be folded at the tree level If this PR was only about x87 using fsincos for sincos this is fixed now. Well, it was mostly about getting rid of TREE_ADDRESSABLE, which you can really do efficiently only on x87, using fsincos. But maybe it's time to change the subject. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17687
[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3
--- Comment #61 from paolo dot bonzini at lu dot unisi dot ch 2006-08-10 14:28 --- Subject: Re: [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3 Making vectorization depend on a flag that says it is allowed to violate IEEE is therefore a killer for me (and most knowledgable fp guys). This is ironic, since vectorization of sums (as in GEMM) is usually implemented as scalar expansion on the accumulators In case of GCC, it performs the transformation that Dorit explained. It may not produce an IEEE-compliant answer if there are zeros and you expect to see a particular sign for the zero. and this not only produces an IEEE-compliant answer The IEEE standard mandates particular rules for performing operations on infinities, NaNs, signed zeros, denormals, ... The C standard, by mandating no reassociation, ensures that you don't mess with NaNs, infinities, and signed zeros. As soon as you perform reassociation, there is *no way* you can be sure that you get IEEE-compliant math. +Inf + (1 / +0) = Inf, +Inf + (1 / -0) = NaN. but it is *more* accurate for almost all data. http://citeseer.ist.psu.edu/589698.html is an example of a paper that shows FP code that avoids accuracy problems. Any kind of reassociation will break that code, and lower its accuracy. That's why reassociation is an unsafe math optimization. If you want a -freassociate-fp math, open an enhancement PR and somebody might be more than happy to separate reassociation from the other effects of -funsafe-math-optimizations. (Independent of this, you should also open a separate PR for ATLAS vectorization, because that would not be a regression and would not be on x87) :-) Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827
[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3
--- Comment #63 from paolo dot bonzini at lu dot unisi dot ch 2006-08-10 15:22 --- Subject: Re: [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3 If you want a -freassociate-fp math, open an enhancement PR and somebody Ah, you mean like I asked about in end of 2nd paragraph of Comment #56? (Independent of this, you should also open a separate PR for ATLAS vectorization, because that would not be a regression and would not be on x87) :-) You mean like I pleaded for in the last paragraph of Comment #38 Be bold. Don't ask, just open PRs if you feel an issue is separate. Go ahead now if you wish. Having them closed or marked as duplicate is not a problem, and it is much easier to track than cluttering an existing PRs. All these issues with ATLAS will not be visible to somebody looking for bug fixes known to fail in 4.2.0, because the original problem is now fixed in that version, and will soon be in 4.1.1 too. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827
[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3
--- Comment #48 from paolo dot bonzini at lu dot unisi dot ch 2006-08-08 07:05 --- Subject: Re: [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3 In x86/x86-64 world one can be almost sure that the load+execute instruction pair will execute (marginaly to noticeably) faster than move+load-and-execute instruction pair as the more complex instructions are harder for on-chip scheduling (they retire later). Yes, so far so good and this part has already been committed. But does a *single* load-and-execute instruction execute faster than the two instructions in a load+execute sequence? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827
[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3
--- Comment #51 from paolo dot bonzini at lu dot unisi dot ch 2006-08-09 04:33 --- Subject: Re: [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3 I've been scoping this a little closer on the Athlon64X2. I have found that the patched gcc can achieve as much as 93% of theoretical peak (5218Mflop on a 2800Mhz Athlon64X2!) for in-cache matmul when the code generator is allowed to go to town. Not unexpected. Code was so tightly tuned for GCC 3, and so big were the changes between GCC 3 and 4, that you were comparing sort of apples to oranges. It could be interesting to see which different optimizations are performed by your code generator for GCC 3 vs. GCC 4. fmull 1440(%rcx) #else fldl1440(%rcx) fmulp %st,%st(1) #endif To my surprise, on this arch, using the fldl/fmulp pair caused a performance drop. So, either my SSE experience does not necessarily translate to x87, or the Opteron (where I did the SSE tuning) is subtly different than the Athlon64X2, or my memory of the tuning is faulty. Just as a check, Paulo: is this the peephole you would do? In some sense, this is the peephole I would rather *not* do. But the answer is yes. :-) So, do you now agree that the bug would be fixed if the patch that is in GCC 4.2 was backported to GCC 4.1 (so that your users can use that)? And do you still see the abysmal x87 single-precision FP performance? Thanks! -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827
[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3
--- Comment #40 from paolo dot bonzini at lu dot unisi dot ch 2006-08-07 16:58 --- Subject: Re: [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3 I don't see how the last fmul[sl] can be removed without increasing code size. However, I can see that the peephole phase might not be able to change the register usage. Actually, the peephole phase may not change the register usage, but it could peruse a scratch register if available. But it would be much more controversial (even if backed by your hard numbers on ATLAS) to state that splitting fmul[sl] to fld[sl]+fmul is always beneficial, unless there is some manual telling us exactly that... for example it would be a different story if it could give higher scheduling freedom (stuff like VectorPath vs. DirectPath on Athlons), and if we could figure out on which platforms it improves performance. On this front, is there some reason you cannot post the patch(es) as attachments, just to rule out copy problems, as I've asked in last several messages? Note there's no need if I can grab your stuff from SVN, as below . . . You already found about this :-P Unfortunately I mistyped the PR number when I committed the patch; I meant the commit to appear in the audit trail, so that you'd have seen that I had committed it. because my tests were run on a similar Prescott (P4e) You didn't post the gcc 3 performance numbers. What were those like? If you beat/tied gcc 3, then the remaining fmul[l,s] are probably not a big deal. If gcc 3 is still winning, on the other hand . . . I don't have GCC 3 on that machine. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827
[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3
--- Comment #42 from paolo dot bonzini at lu dot unisi dot ch 2006-08-07 18:19 --- Subject: Re: [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3 We should get some idea by comparing gcc3 vs. your patched compiler on the various platforms, though other gcc3/4 changes will cloud the picture somewhat . . . That's why you should compare 4.2 before and after my patch, instead. If this kind of machine difference in optimality holds true for x87 as well, I assume a new peephole phase that looks for the scratch register could be called if the appropriate -march were thrown? Or you can disable the fmul[sl] instructions altogether. Speaking of -march issues, when I get a compiler build that gens your new code, I will pull the assembly trick to try it on the CoreDuo as well. If the new code is worse, you can probably not call your present peephole if that -march is thrown? I'd find it very strange. It is more likely that the Core Duo has a more powerful scheduler (maybe the micro-op fusion thing?) that does not dislike fmul[sl]. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827
[Bug target/27390] [4.2 Regression] gcc.target/x86_64/abi/test_complex_returning.c execution fails at -O0
--- Comment #5 from paolo dot bonzini at lu dot unisi dot ch 2006-05-22 07:35 --- Subject: Re: [4.2 Regression] gcc.target/x86_64/abi/test_complex_returning.c execution fails at -O0 It was mentioned in http://gcc.gnu.org/ml/gcc-patches/2006-04/msg00390.html Also. And nothing has been done about the problem since that message. Sorry, my understanding was that reverting the regclass.c hunk had fixed the problem. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27390
[Bug middle-end/26869] [4.1/4.2 Regression] Segfault in find_lattice_value() for complex operands.
--- Comment #7 from paolo dot bonzini at lu dot unisi dot ch 2006-04-18 14:47 --- Subject: Re: [4.1/4.2 Regression] Segfault in find_lattice_value() for complex operands. rguenth at gcc dot gnu dot org wrote: --- Comment #6 from rguenth at gcc dot gnu dot org 2006-04-18 14:46 --- I'll bootstrap test the obvious patch then. It's *NOT* obvious. Please have it reviewed! Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26869
[Bug target/27006] [4.1/4.2 Regression] Invalid altivec constant loading code
--- Comment #11 from paolo dot bonzini at lu dot unisi dot ch 2006-04-14 07:27 --- Subject: Re: [4.1/4.2 Regression] Invalid altivec constant loading code I'm not sure why you think that two splats and two adds is too expensive. I'd hope that these constants stay in the cache... I'd be more interested in implementing a splat+shift combination to make EASY_VECTOR_15_ADD_SELF more generic. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27006
[Bug target/27117] [4.2 Regression] gcc fails to build on sh64-elf targets
--- Comment #4 from paolo dot bonzini at lu dot unisi dot ch 2006-04-12 14:09 --- Subject: Re: [4.2 Regression] gcc fails to build on sh64-elf targets I think the best solution is to have an INDEX_REG_CLASS_FOR_MODE macro, which defaults to INDEX_REG_CLASS. Then this macro can be defined for the SH to return GENERAL_REGS for DImode when compiling SHmedia code. Thanks for the analysis. I quickly tested that your approach works for Kaz's testcase. However I don't feel confident enough to write this patch though -- and even less to document it. Are you going to do it, or should I go on and revert the regclass.c change? This is what I tested BTW: Index: reload.c === --- reload.c(revision 112658) +++ reload.c(working copy) @@ -5316,7 +5316,7 @@ find_reloads_address_1 (enum machine_mod RTX_CODE code = GET_CODE (x); if (context == 1) -context_reg_class = INDEX_REG_CLASS; +context_reg_class = GET_MODE (x) == DImode ? GENERAL_REGS : INDEX_REG_CLASS; else context_reg_class = base_reg_class (mode, outer_code, index_code); -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27117
[Bug tree-optimization/26830] [4.1/4.2 Regression] Insane amount of compile-time / memory needed at -O1 and above
--- Comment #24 from paolo dot bonzini at lu dot unisi dot ch 2006-03-31 07:37 --- Subject: Re: [4.1/4.2 Regression] Insane amount of compile-time / memory needed at -O1 and above Note that the regression is in 4.1, too, so we should consider backporting changes that accumulate here to the branch after a while. Sure, but I am a bit nervous about backporting right away a change to parts I am not familar with. Let's wait until a week after *all* patches are applied. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26830
[Bug bootstrap/25435] stage build no longer works
--- Comment #6 from paolo dot bonzini at lu dot unisi dot ch 2006-03-21 15:53 --- Subject: Re: stage build no longer works hjl at lucon dot org wrote: --- Comment #5 from hjl at lucon dot org 2006-03-21 15:09 --- When I make a backend change, make at the top level still tries to rebuild all libraries, even though nothing in library will be recompiled. When Java is enabled, it may take a while just to check if libjava needs to be recompiled. Is there an option to say I only want to rebuild compiler, not libraries? One particular stage: make all-stageN All stages: make stageN-bubble (usually N=3 of course) All this is now documented in gcc's manual. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25435
[Bug bootstrap/26582] [4.2 Regression] warning with cross build
--- Comment #1 from paolo dot bonzini at lu dot unisi dot ch 2006-03-06 14:35 --- Subject: Re: New: [4.2 Regression] warning with cross build pinskia at gcc dot gnu dot org wrote: I get the following warnings when doing a cross (any kind of cross really) Makefile:13366: warning: overriding commands for target `restrap' Makefile:12658: warning: ignoring old commands for target `restrap' I was aware of this, did nothing so far because Dan Jacobowitz said he would rip old bootstrap bits soon (including this one). Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26582
[Bug bootstrap/25790] make clean fails
--- Comment #3 from paolo dot bonzini at lu dot unisi dot ch 2006-01-20 17:21 --- Subject: Re: make clean fails aoliva at gcc dot gnu dot org wrote: --- Comment #2 from aoliva at gcc dot gnu dot org 2006-01-20 17:16 --- If you mean make -k for sub-makes, yes. But `make clean make make check' ought to work, and not stop after make clean because it looks like it failed. Yes. In this case, anyway, the problem is that for a pasto I left out a ; \ in r104978. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25790
[Bug bootstrap/25790] make clean fails
--- Comment #1 from paolo dot bonzini at lu dot unisi dot ch 2006-01-16 07:56 --- Subject: Re: New: make clean fails aoliva at gcc dot gnu dot org wrote: It appears that make clean (on a native bootstrap) always fails for me. After make clean on a tree containing a build interrupted in stage 2, it fails trying to make clean in stage 3 gcc. After make clean on a successfully-completed bootstrap tree, it fails trying to make clean in stage 4 gcc. I believe that targets such as clean, check, and similar, ought to use make -k. Would you be ok with this? Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25790
[Bug tree-optimization/24123] [4.1/4.2 Regression] Massive performance regression for -ffast-math due to the recip tree pass
--- Comment #25 from paolo dot bonzini at lu dot unisi dot ch 2006-01-04 16:29 --- Subject: Re: [4.1/4.2 Regression] Massive performance regression for -ffast-math due to the recip tree pass For PowerPC, it is effective to use the instruction if there are multiple divides, such as the three divisions mentioned above. The IBM XLC compiler propagates the reciprocal and numerator pair through its equivalent to RTL. I am not sure I follow you. I see two questions, but it could be that you asked neither: 1) You want to use the reciprocal instruction instead of a FP divide. This could be done in the expander, or with a new RECIP_EXPR tree code. I'd rather do the former. 2) Multiplying by the result of the reciprocal instruction always provides the exact result of the division, so you want to enable the pass always. It could be possible to parameterize the recip pass on a separate -fdivide-by-reciprocals flag, turned on by -funsafe-math-optimization, but also always turned on in config/rs6000/rs6000.c Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24123
[Bug bootstrap/25435] stage build no longer works
--- Comment #3 from paolo dot bonzini at lu dot unisi dot ch 2005-12-16 08:02 --- Subject: Re: stage build no longer works hjl at lucon dot org wrote: --- Comment #2 from hjl at lucon dot org 2005-12-16 07:37 --- I made a change to i386.c. I just want to rebuild the final compiler with the stage 2 compiler. I don't want to rebootstrap the whole gcc. What should I do? You can do make all-stage3 (which is the new name of restageN). Or just do make which will do the equivalent of a bubblestrap (it does not recompile everything, but it does percolate the change through the whole set of stages, so that you do not get spurious comparison failures). Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25435
[Bug target/25259] bootstrap failures on non-C99 platforms
--- Comment #4 from paolo dot bonzini at lu dot unisi dot ch 2005-12-07 10:52 --- Subject: Re: bootstrap failures on non-C99 platforms I bootstrapped this on i686-pc-linux-gnu, all languages. Eric, can you test it on a non-C99 platform? I don't seem to be able to regenerate aclocal.m4 and configure correctly, so the new test GCC_HEADER_STDINT is not expanded. You have to do aclocal -I ../config, or you can configure with --enable-maintainer-mode. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25259
[Bug middle-end/24851] [4.1 Regression] f2c miscompilation
--- Comment #17 from paolo dot bonzini at lu dot unisi dot ch 2005-11-16 09:41 --- Subject: Re: [4.1 Regression] f2c miscompilation rguenth at gcc dot gnu dot org wrote: --- Comment #16 from rguenth at gcc dot gnu dot org 2005-11-16 09:39 --- Is the second reduced testcase not fine from a standards POV? I.e. void abort(void); int main() { int a[10], *p, *q; q = a[1]; p = q[-1]; if (p = a[9]) abort (); return 0; } or does array in the standard refer to q[0]..q[8] here? No, this is fine. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24851
[Bug middle-end/24853] scheduling takes 40% or more time
--- Comment #4 from paolo dot bonzini at lu dot unisi dot ch 2005-11-14 18:26 --- Subject: Re: scheduling takes 40% or more time Is it the first scheduling pass? If so, we have a patch at AdaCore to limit its explosion. Yes, it is. schedule_insns2 takes nothing. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24853
[Bug target/24230] [4.1 Regression] ICE in extract_insn with altivec
--- Comment #17 from paolo dot bonzini at lu dot unisi dot ch 2005-10-28 19:16 --- Subject: Re: [4.1 Regression] ICE in extract_insn with altivec On IRC it was suggested that we just need to get a version of easy_vector_constant which does the right thing in any mode. Yes, it looks like the bug is that the constant is declared easy until it is in V8HI mode, but not when the reload is done in V16QI mode. It may make sense to assert !reload_in_progress !reload_completed before force_const_mem is called. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24230
[Bug target/18631] [4.0 Regression] missing error messages passing vectors with -mno-altivec -mabi=altivec
--- Comment #4 from paolo dot bonzini at lu dot unisi dot ch 2005-10-22 09:52 --- Subject: Re: [4.0 Regression] missing error messages passing vectors with -mno-altivec -mabi=altivec I *think* it is also fixed on 4.0; a grep for the error message in config/rs6000/rs6000.c would confirm or deny this. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18631
[Bug middle-end/24151] [4.0/4.1 Regression] gcc.dg/asm-1.c (test for excess errors) fails
--- Additional Comments From paolo dot bonzini at lu dot unisi dot ch 2005-10-01 11:00 --- Subject: Re: [4.0/4.1 Regression] gcc.dg/asm-1.c (test for excess errors) fails pinskia at gcc dot gnu dot org wrote: --- Additional Comments From pinskia at gcc dot gnu dot org 2005-09-30 19:36 --- Confirmed. As noted before this checking really should be in the front-end and it seems it also needs a check for error_mark_node too. I'm testing a patch which moves the warning to the front-ends (which is anyway unrelated to this PR) and checks for error_mark_node. asm-1.c, by the way, is compiled (but of course not assembled) on every target. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24151
[Bug tree-optimization/24123] [4.1 Regression] Massive performance regression for -ffast-math due to the recip tree pass
--- Additional Comments From paolo dot bonzini at lu dot unisi dot ch 2005-09-30 12:53 --- Subject: Re: [4.1 Regression] Massive performance regression for -ffast-math due to the recip tree pass It looks to me that header is reversed! pov::sbisect is 1.50 _with_ recip. ehm, right. indeed i was looking at sbisect now ;-) Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24123
[Bug tree-optimization/24123] [4.1 Regression] Massive performance regression for -ffast-math due to the recip tree pass
--- Additional Comments From paolo dot bonzini at lu dot unisi dot ch 2005-09-30 14:13 --- Subject: Re: [4.1 Regression] Massive performance regression for -ffast-math due to the recip tree pass Currently, there seems to be some problems, i.e.: double pov::f_polytubes(double*, unsigned int) (ptr, D.22748) - D.22787_46 = -6.28318530717958623199592693708837032318115234375e+0 / D.22783_80; + reciptmp.882_72 = 1.0e+0 / D.22783_80; + D.22787_46 = -6.28318530717958623199592693708837032318115234375e+0 * reciptmp.882_72; Not needed, only one user. no, there is another user in another basic block. Function double pov::POVFPU_RunDefault(pov::FUNCTION) L193:; - r0_1660 = r0_89 / r0_89; + reciptmp.492_84 = 1.0e+0 / r0_89; + r0_1660 = r0_89 * reciptmp.492_84; goto bb 1062 (L1339); The result of above confusion is (1.0)! We are in fast-math, so no NaNs, etc.. void pov::Simulate_Media(pov::IMEDIA**, pov::RAY*, pov::INTERSECTION* - reciptmp.1152_907 = 1.0e+0 / prephitmp.1124_293; - reciptmp.1153_1046 = 1.0e+0 / prephitmp.1124_293; - reciptmp.1154_1041 = 1.0e+0 / prephitmp.1124_293; + reciptmp.1275_1270 = 1.0e+0 / prephitmp.1124_293; + reciptmp.1152_907 = 1.0e+0 * reciptmp.1275_1270; + reciptmp.1153_1046 = 1.0e+0 * reciptmp.1275_1270; + reciptmp.1154_1041 = 1.0e+0 * reciptmp.1275_1270; These are all the same? There are already reciptmp variables in loopdone that are all the same. And there are quite some places that have this problem. I think the final DOM run ought to simplify the other problems. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24123
[Bug tree-optimization/24123] [4.1 Regression] Massive performance regression for -ffast-math due to the recip tree pass
--- Additional Comments From paolo dot bonzini at lu dot unisi dot ch 2005-09-30 14:40 --- Subject: Re: [4.1 Regression] Massive performance regression for -ffast-math due to the recip tree pass Function double pov::POVFPU_RunDefault(pov::FUNCTION) L193:; - r0_1660 = r0_89 / r0_89; + reciptmp.492_84 = 1.0e+0 / r0_89; + r0_1660 = r0_89 * reciptmp.492_84; goto bb 1062 (L1339); The result of above confusion is (1.0)! We are in fast-math, so no NaNs, etc.. Can you prepare a patch to fold, that fixes this beauty? :-) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24123
[Bug target/19653] x87 reg allocated for constants for -mfpmath=sse
--- Additional Comments From paolo dot bonzini at lu dot unisi dot ch 2005-09-21 06:51 --- Subject: Re: x87 reg allocated for constants for -mfpmath=sse Note that in this pattern cost computation of MMX_REGS are all ignored ('*' in front of y). So, the cost which is computed is for 'r' which is GENERAL_REGS. This cost is too high and eventually results in memory cost to be lower than register cost. I tried the following simple patch as experiment and got all the performance back (it is now comparable with 4.0). Note that in this patch, I removed the '*' in the 2nd alternative so cost of keeping the operand in mmx_regs class is factored in. This resulted in a lower cost than that of memory. Is this the way to go? This is just an experiment which seems to work. I think it makes sense. The x86 back-end is playing too many tricks (such as the # classes) with the register allocator and regclass especially, and they are biting back. Still, I'd rather hear from an expert as to why the classes were written like this. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19653
[Bug tree-optimization/23948] [4.1 Regression] internal compiler error: verify_stmts failed
--- Additional Comments From paolo dot bonzini at lu dot unisi dot ch 2005-09-21 14:33 --- Subject: Re: [4.1 Regression] internal compiler error: verify_stmts failed rguenth at gcc dot gnu dot org wrote: --- Additional Comments From rguenth at gcc dot gnu dot org 2005-09-21 14:18 --- We insert the reciprocal computation correctly after the call to double prrs = potentially_runnable_resource_share(); but as this call may trap and is the last instruction in the basic block, inserting there is obviously bogous. We'd need to insert a new BB or need a way to insert on the EXIT_EDGE. And make sure critical edges are split. No, I think we have to rethink the place where we insert the division. It needs to be closer to the divide (just before), not right after the definition. When we have flag_trapping_math, this is quite hard to do (there may be even multiple places to insert the divide!). I also did not understand why you had to fiddle with postdominators :-) to fix PR23309. I have a prototype patch but it will probably be a while before I can sit, look if it really works, and test it properly. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23948
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
--- Additional Comments From paolo dot bonzini at lu dot unisi dot ch 2005-08-17 20:07 --- Subject: Re: [meta-bug] optimizations that CSE still catches unsigned outcnt; extern void flush_outbuf(void); void bi_windup(unsigned char *outbuf, unsigned char bi_buf) { outbuf[outcnt] = bi_buf; if (outcnt == 16384) flush_outbuf(); outbuf[outcnt] = bi_buf; } Presumably the store into outbuf prevents the SSA optimizers from commonizing the first two loads of outcnt and the call to flush_outbuf prevents the SSA optimizers from commonizing the last load of outcnt on the path which bypasses the call to flush_outbuf. Right? Not really. First of all, as stevenb pointed out on IRC, this is quite specific to powerpc-apple-darwin and other targets where programs are compiled as PIC by default. Steven's SPEC testing under Linux has not shown this behavior, but shared libraries there *will* suffer from the same problem! We'd want the code to become void bi_windup(unsigned char *outbuf, unsigned char bi_buf) { int t1 = outcnt; outbuf[t1] = bi_buf; int t2 = outcnt, t3; if (t2 == 16384) { flush_outbuf(); t3 = outcnt; } else t3 = t2; outbuf[t3] = bi_buf; } If we disable CSE path following, and keep only one GCSE pass, we waste the opportunity to do this optimization, because we generate temporaries for the partially redundant address of outcnt. With two GCSE passes, the second is able to eliminate the partially redundant load. Of course what we really miss is load PRE on the tree level, but it is good that --param max-gcse-passes=2 can be a replacement of -fcse-skip-blocks -fcse-follow-jumps. Testing mainline GCC against a patch including no path following + 2 GCSE passes + my forward propagation pass, I'm seeing SPEC improvements of +2 to +8% on powerpc-apple-darwin. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug tree-optimization/21639] poisoned ggc memory used for -ftree-vectorize
--- Additional Comments From paolo dot bonzini at lu dot unisi dot ch 2005-05-24 11:59 --- Subject: Re: poisoned ggc memory used for -ftree-vectorize Paolo, is the above solution ok with you? If so, I'll go ahead and prepare a patch. Alternatively, if ggc_collect is really required here, then adding a call to scev_reset() would also solve the problem (again, in tree-complex.c): Is there a rule that ggc_collect should not be called during loop optimizations? If not, you should not have to modify my pass since it is only showing the problem by doing TODO_ggc_collect. You have to make sure that scev_reset is called at the appropriate time, or that the contents of the scalar evolution hash table are reached during garbage collection. In other words, Janis is correct saying that this is a latent bug. Doing a TODO_ggc_collect after lower_vector_ssa may be overkill, and I'm ok with removing it, but then you are only papering over the real problem. If you have some time to spare, use a --enable-checking=gcac compiler, and the failure will likely happen much earlier. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21639
[Bug tree-optimization/21639] poisoned ggc memory used for -ftree-vectorize
--- Additional Comments From paolo dot bonzini at lu dot unisi dot ch 2005-05-24 14:26 --- Subject: Re: poisoned ggc memory used for -ftree-vectorize there are several places in loop opts that are not GGC-safe (in particular the tree nodes refered from loop structures are not seen by garbage collector, and I think there are some other instances). So at the moment, you cannot run ggc_collect inside loop opts. Is this going to change? If not, I guess removing TODO_ggc_collect is the simpliest thing to do. But then please a PR on this should be opened, or this one should be modified appropriately, as an enhancement request for making loop optimizations GGC-safe. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21639
[Bug java/17845] [4.1 Regression] More problems with simple type names as superclasses
--- Additional Comments From paolo dot bonzini at lu dot unisi dot ch 2005-05-11 12:31 --- Subject: Re: [4.1 Regression] More problems with simple type names as superclasses I saw something like this before in a different bug. It must have been PR21436, which I also reported/distilled/fixed. But it is not a dup, because no imports are involved in this case. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17845
[Bug tree-optimization/17687] [4.1] sincos can be folded at the tree level
--- Additional Comments From paolo dot bonzini at lu dot unisi dot ch 2005-03-15 06:36 --- Subject: Re: [4.1] sincos can be folded at the tree level Paolo, are you going to submit this one? Yes, but I am wy too busy at work now. Maybe as soon as Thursday. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17687
[Bug bootstrap/17383] [4.0 Regression] Building in src dir fails
--- Additional Comments From paolo dot bonzini at lu dot unisi dot ch 2005-02-09 12:30 --- Subject: Re: [4.0 Regression] Building in src dir fails I have considered doing this in the truly parallel way: namely, introducing HOST_SUBDIR to go along with BUILD_SUBDIR and TARGET_SUBDIR. I already have a patch for this queued for 4.1 and posted to gcc-patches. DJ however has a hackish fix that will go in once 4.0 branches. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17383
[Bug target/19528] [4.0 regression] missing ra.h
--- Additional Comments From paolo dot bonzini at lu dot unisi dot ch 2005-01-19 12:55 --- Subject: Re: [4.0 regression] missing ra.h steven at gcc dot gnu dot org wrote: --- Additional Comments From steven at gcc dot gnu dot org 2005-01-19 12:26 --- ...and remove the #include ra.h of course. Doh. I'm sorry for the breakage, but why the heck does the SH back-end have to include it? Why does the SH back-end have to behave different from every other one?^A^K Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19528
[Bug bootstrap/17383] [4.0 Regression] Building in src dir fails
--- Additional Comments From paolo dot bonzini at lu dot unisi dot ch 2004-12-26 11:00 --- Subject: Re: [4.0 Regression] Building in src dir fails aoliva at gcc dot gnu dot org wrote: --- Additional Comments From aoliva at gcc dot gnu dot org 2004-12-23 21:03 --- Paolo's patch looks good to me. Paolo, would you please check it in? Yes; will you please take a look at the comment at http://gcc.gnu.org/ml/gcc-patches/2004-10/msg01079.html ? Thanks, Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17383
[Bug target/17836] [4.0 Regression] ABI breakage for 16-byte vectors (non-Altivec ABI ISA)
--- Additional Comments From paolo dot bonzini at lu dot unisi dot ch 2004-11-23 08:19 --- Subject: Re: [4.0 Regression] ABI breakage for 16-byte vectors (non-Altivec ABI ISA) patches committed Thank you very much. Sorry for the misunderstandings. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17836
[Bug tree-optimization/18308] ICE in do_jump, at dojump.c:274
--- Additional Comments From paolo dot bonzini at lu dot unisi dot ch 2004-11-18 08:03 --- Subject: Re: ICE in do_jump, at dojump.c:274 And that would mean it was caused by: * dojump.c (do_jump) COND_EXPR, EQ_EXPR, NE_EXPR, TRUTH_ANDIF_EXPR, TRUTH_ORIF_EXPR, COMPOUND_EXPR: Abort on gimplified cases. While I can work on a fix, those COND_EXPR were *not* valid GIMPLE at the time the patch was written. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18308