[Bug target/43902] suboptimal MIPS widening multiply accumulate
--- Comment #12 from wilson at codesourcery dot com 2010-06-17 04:29 --- Subject: Re: suboptimal MIPS widening multiply accumulate On Wed, 2010-06-16 at 13:29 +, bernds at gcc dot gnu dot org wrote: > Could you retest the MIPS fixed-point testcases with the obvious fix? You > probably have the MIPS toolchain set up already and it would probably take me > more time. Unfortunately, the fix is not quite that easy. We need to handle saturating types in the optab check. I already wrote the multiply add check that way, but the widening-multiply check needs to be fixed. It turns out that there isn't any obtab for saturating widening multiply, so we have to add one. Also, there is also no pattern for it in the mips-fixed.md file. There is only the saturating widening multiply accumulate pattern. I believe I can fix this by adding a multiply accumulate pattern that adds zero, but now I'm starting to get cascading patches here. This could take a few days. Since I was looking at adding zero, I couldn't help but notice that the MIPS fixed point support seems incomplete. _Sat long _Fract f1 (void) { return 0; } gives me j $31 lw $2,.LC0 .LC0: .word 0x0 But this can be split off into a separate bug report. Jim -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43902
[Bug target/43902] suboptimal MIPS widening multiply accumulate
--- Comment #10 from wilson at codesourcery dot com 2010-06-16 06:30 --- Subject: Re: suboptimal MIPS widening multiply accumulate On Wed, 2010-06-09 at 20:21 +, bernds at gcc dot gnu dot org wrote: > What do you think? Please let me know what your MIPS tests turned up. I'm looking at this again. My MIPS tests showed that my patch fixed 17 gcc.target/mips multiply-accumulate testcases that were accidentally broken by the new tree level widening multiply optimization pass. My new testcase madd-9.c failed, but it turned out that I accidentally double-patched the file. Fixing the file, it now passes. I forgot to include this testcase in my second patch, though it was there in the first one. The bad news is that there are two new failures: dpaq_sa_l_w.c and dpsq_sa_l_w.c. These are MIPS DSP fixed-point multiply-accumulate testcases, which is something I definitely didn't bother to check earlier. Overall I would say it is a win, since it fixes many int/long/long long examples, and only breaks a few obscure cases that should be fixable with a little more work. I'm looking at your patch too. There is a testcase that doesn't appear to belong, gcc.target/arm/pr42172-1.c. I'm not sure why convert_plusminus_to_widen needs to check for MULT and call convert_mult_to_widen. If we are scanning basic blocks from top to bottom, it seems that the multiplies would have already been checked. But maybe this has something to do with looking at operands computed in other basic blocks that haven't been scanned yet, in which case it would make sense. I'm not sure if that is possible. Otherwise, it looks like you have completed and cleanup up a bunch of stuff that I didn't get far enough to notice. It all looks good to me. I can try rerunning MIPS testcases to see how it works. I see that the failure of the DSP fixed-point tests is because we have checks for + if (TREE_CODE (type) != INTEGER_TYPE) +return false; and the old code in expr.c that we are replacing does - if ((TREE_CODE (type) == INTEGER_TYPE - || TREE_CODE (type) == FIXED_POINT_TYPE) So that looks like an easy fix (assuming no other complications), but will require a rebuild and retest. Jim -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43902
[Bug target/43902] suboptimal MIPS widening multiply accumulate
--- Comment #8 from wilson at codesourcery dot com 2010-06-07 22:18 --- Subject: Re: suboptimal MIPS widening multiply accumulate On Mon, 2010-06-07 at 21:34 +, bernds at gcc dot gnu dot org wrote: > Jim, are you still working on this or should I pick it up? I'm working on it, but only a few hours a week, so my progress will be slow. If you have more time that would help. I put a patch in the PR that seems to be working well, but there were a few minor issues that turned up during a MIPS testsuite run that I wanted to look at before formally submitting the patch. Jim -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43902
[Bug target/43902] suboptimal MIPS widening multiply accumulate
--- Comment #3 from wilson at codesourcery dot com 2010-05-03 22:28 --- Subject: Re: suboptimal MIPS widening multiply accumulate On Tue, 2010-04-27 at 09:33 +, rguenth at gcc dot gnu dot org wrote: > For more general optimization you might want to move all this code to > the tree pass before expansion that detects widening multiplication. > The DOT_PROD_EXPR tree code can be used to carry the information to > the expander. I didn't understand the point here at first, but after updating my tree and rebuilding I now see Bernd's patch to add the new widening_mul optimization pass which I hadn't noticed before. The MIPS macc/madd support is now even more broken than before. I can generate them at -O but not at -O2 now, as the code in expand_expr_real_2 doesn't handle WIDEN_MULT_EXPR which is created at -O2 by the new code. So as Richard Guenther mentioned, the solution seems to be to remove the old code in expand_expr_real_2 and add something new in tree-ssa-math-opt.c to replace it. This means we won't be able to generate widening multiply accumulate with -O, but we will at -O2. This is probably acceptable, though it makes the solution more complicated than I had originally expected. Jim -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43902
[Bug debug/43828] Emit debug info allowing inlined functions to show in stack traces
--- Comment #3 from wilson at codesourcery dot com 2010-04-23 22:40 --- Subject: Re: New: Emit debug info allowing inlined functions to show in stack traces On 04/21/2010 02:26 AM, scovich at gmail dot com wrote: > It would be very nice if gcc emitted debug information that allowed profilers > and debuggers the option to extract a stack trace which included calls to > inlined functions. This would allow developers much greater insight into the > behavior of optimized code. Try the -i option of addr2line. Jim -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43828
[Bug middle-end/43760] [4.6 regression] New test failures
--- Comment #4 from wilson at codesourcery dot com 2010-04-20 01:54 --- Subject: Re: [4.6 regression] New test failures On Tue, 2010-04-20 at 00:01 +, sje at cup dot hp dot com wrote: > Any ideas on how to fix the compiler? The best idea I could come up with was > to check each basic block to see if there are any conditional sets of > predicate > registers in it and add pred.rel.mutex lines for those registers. If we add > pred.rel.mutex lines for all used predicate registers we wind up adding a lot > of unneeded pred.rel.mutex statements. I haven't had a chance to look at all of the details yet. I believe that we only need to handle the case when one mutex pair is used to create another one, e.g. something like (p17) cmp.geu p6, p7 = r42, r51 (p16) cmp.gtu p6, p7 = r42, r51 We also only need to handle the case where we have a call-clobbered mutex pair, e.g. p6/p7 in this case, and there was a call insn in between here and any previous set of p6/p7. Currently, we only insert pred.rel.mutex lines at the beginning of basic blocks that start with labels. So now we also need to emit them after such sequences, or alternatively after calls. I'm not sure if the assembler is wrong here. mutex just means that we will never have both p16 and p17 set at the same time. So there are 3 possibilities, p16 true, p17 true, neither true. If p16 is true, then clearly p6/p7 will be mutex. Likewise if p17 is true. But if neither is true, then p6/p7 will be unmodified, and hence aren't guaranteed to be mutex unless they were already mutex before these lines. This is why it only fails when we have a call-clobbered mutex pair after an intervening call, as now we don't know anything about the previous values of p6/p7. If we know that p16/p17 were set in a simple compare, then we will know that 1 and only 1 will be true, which is a stronger condition than mutex, and in this case p6/p7 will be mutex. There is no standard .pred.rel syntax for this though, which means that if the p16/p17 set was in a different basic block, and we have only the .pred.rel.mutex line inserted by the compiler, then the assembler can't prove that p6/p7 will be mutex after this point. I'm not eager to extend the assembler syntax to fix this, and rewrite the DV code in the assembler, so I'm hoping that we can fix it in the compiler by emitting another .pred.rel.mutex line in this case. This hopefully doesn't happen too often and isn't too hard. OK, maybe I have studied it a little, but I need to look at this a bit more before I'm sure I understand everything that is going on here. Jim -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43760
[Bug driver/43687] Unexpected error message for bad command line argument
--- Comment #5 from wilson at codesourcery dot com 2010-04-09 22:16 --- Subject: Re: Unexpected error message for bad command line argument On 04/09/2010 02:34 PM, wilson at gcc dot gnu dot org wrote: > POSIX says that for command line arguments "-a -d", "-d -a", "-da", and "-ad" > are all equivalent. Sorry, this is an overstatement. I didn't mean to imply that option order doesn't matter. So I should have said that "-d -a" and "-da" are equivalent, which is not true in gcc. Jim -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43687
[Bug rtl-optimization/43520] gcc.dg/pr43058.c uses way too memory on ia64
--- Comment #3 from wilson at codesourcery dot com 2010-03-25 23:21 --- Subject: Re: gcc.dg/pr43058.c uses way too memory on ia64 On Thu, 2010-03-25 at 19:05 +, pinskia at gcc dot gnu dot org wrote: > Did you test earlier versions of GCC with the testcase? I hadn't gotten around to it yet. I am at Cisco Tue-Thu, and I can't do much IA-64 work these days. It compiles OK with gcc-4.4. > Also it should not hard to figure out where in the scheduler the memory is > being used. The kernel panic is not our fault so make sure the kernel guys > get > a bug report also; the memory usage is though. Another thing I hadn't gotten around to yet. The problem appears to be Vlad's -fsched-pressure stuff. reg_last->implicit_sets has about 5000 instructions on the list. If we have 5000 insns each of which is getting a dependency on each other, that is 25M dependencies, at 40 bytes each, which is 1GB, not counting other overhead. There is also memory being allocated somewhere else, as this testcase needs more than 3GB to be compiled. It appears that these lists need some kind of throttling to prevent them from getting too big. Or maybe they aren't being constructed correctly for IA-64. -fno-sched-pressure doesn't help, as that does not disable creation of the implicit register sets. I can spend more time looking at this later. Everybody knows the linux kernel oom killer has problems, and that there are no easy solutions. I don't need to report that. Jim -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43520
[Bug target/42040] [ia64] Inappropriate address spills
--- Comment #13 from wilson at codesourcery dot com 2010-03-23 02:11 --- Subject: Re: [ia64] Inappropriate address spills On Mon, 2010-03-22 at 20:48 +, sje at cup dot hp dot com wrote: > Since the proposed patch to meant to address non-optimimal code generation I > decided to try the patch with SPEC2006 and see if it helped the performance. This isn't a simple issue, performance wise. If the compiler puts an address+offset into sdata, then we get something like .sdata .align 8 .LC2: data8 mv_tables#+72 .text addl r14 = @gprel(.LC2), gp ;; ld8 r40 = [r14] If the compiler emits the code that the ABI intended here, we get something like addl r14 = @ltoff(mv_tables#), r1 ;; ld8 r14 = [r14] ;; adds r40 = 72, r14 This sequence is one instruction longer. We deliberately do this to avoid accidentally overflowing the got. If you access the same array with a thousand different offsets, then instead of putting a thousand entries into the got, we put one entry in the got, and then add in the offset. So this is a smaller got, but an extra add instruction that may add to latency if we can't hide it. This code is working for got entries, but not when an address is forced into constant pool. So which one is better depends on what the code looks like. If you have a small number of variables and a large number of offsets, then you will get a lot more memory references and a larger got with the first alternative. If you have a large number of variables and a small number of offsets, then you will get more add instructions with the second alternative. This issue originally came up via PR 41567, which triggers a linker error. Perhaps that testcase would not have triggered the linker error if we were using the second alternative. I haven't looked at this PR as yet, so I don't know the details of what is going on here. Jim -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42040
[Bug target/42040] [ia64] Inappropriate address spills
--- Comment #11 from wilson at codesourcery dot com 2010-03-18 00:12 --- Subject: Re: [ia64] Inappropriate address spills On Wed, 2010-03-17 at 23:47 +, sje at cup dot hp dot com wrote: > Reading Richard's initial comment I thought the problem was that the code was > (or could be) illegal because the relocation may be out of range and we > shouldn't > use the gprel relocation for any of these constant pool references. The code is invalid only if you use -mno-sdata. The only reason why Richard wanted to use -mno-sdata is because he saw the non-optimal code, and hoped that -mno-sdata would fix that. It didn't. It just made the problem worse by generating invalid code. So there is probably also a bug here in the handling of -mno-sdata, but fixing that doesn't help Richard, because it doesn't solve the real problem he was complaining about, which was the non-optimal code. I suspect that the -mno-sdata bug is that ia64_expand_load_address does else if (sdata_symbolic_operand (src, VOIDmode)) emit_insn (gen_load_gprel (dest, src)); and sdata_symbolic_operand fails to check TARGET_NO_SDATA, so it assumes that constant pool entries are still in the sdata section, and we end up with gprel references to items that aren't in the sdata section anymore, which will fail at link time. If we fix that, -mno-sdata will work, but we will still have all of the inappropriate address spills (i.e. non-optimal code) which is what the PR summary is complaining about. Jim -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42040
[Bug target/42040] [ia64] Inappropriate address spills
--- Comment #9 from wilson at codesourcery dot com 2010-03-17 23:25 --- Subject: Re: [ia64] Inappropriate address spills On Wed, 2010-03-17 at 22:09 +, sje at cup dot hp dot com wrote: > I tried the patch and didn't have any problem bootstrapping and I didn't see > any regressions. It also fixed my small test case, but when I went back and > tried some of the other tests from PR 28490 I still saw some of the bad gprel > usages. Here is a slightly more cutdown testcase from 28490 that still fails > with the patch applied and when compiling with -O2. I already mentioned in my previous message that the patch does not eliminate all instances of the gprel usages (constant-pool references). It just eliminates most of them. If you want to eliminate all of them, then you need to get LEGITIMATE_PIC_OPERAND_P working, which in turn will require setting flag_pic by default. which in turn will require verifying that this doesn't break anything. That is probably more work than I have time for. And by "fails", you mean that the compiler is still emitting undesirable code in some cases. The code isn't incorrect, just non-optimal. And we get less non-optimal code with the patch, so it seems to be better than nothing. Unless you want to try to write the better LEGITIMATE_PIC_OPERAND_P solution. Jim -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42040
[Bug target/43348] [4.4 regression] ICE in final_scan_insn, at final.c:2604
--- Comment #6 from wilson at codesourcery dot com 2010-03-17 01:46 --- Subject: Re: New: [4.4 regression] ICE in final_scan_insn, at final.c:2604 The testcase has >if (newCapacity > std::numeric_limits::max() / sizeof(T)) > do { *(int *)(uintptr_t)0xbbadbeef = 0; ((void(*)())0)(); } > while(false); which is not very C++ish, but none the less should "work". The problem is in the ia64.md file. The call patterns use the predicate call_operand which accepts only symbol_ref and register, and constraints that include a register "b" and "i". The call operand remains a register until reload, which sees that we have a register equivalent to zero and an "i" constraint and does the substitution. After reload, we then fail, as call_operand does not accept const_int. We need to fix the ia64.md call patterns to use "s" instead of "i". This bug has apparently been present since the port was submitted in gcc-3.0. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43348
[Bug target/42040] [ia64] Inappropriate address spills
--- Comment #5 from wilson at codesourcery dot com 2010-03-13 08:23 --- Subject: Re: [ia64] Inappropriate address spills On third thought... The code here makes sense if we were having problems with invalid constant recombinations. symbol+const gets split by ia64_expand_move into symbol+highpart which is put in the got, and lowpart which is added with adds. If a late optimization pass saw a REG_EQUAL note and tried to put the constant back together again, that would be inconvenient. The ia64_legitimate_constant_p code would prevent that. Unfortunately, this code also prevents us to accepting a constant before ia64_expand_move is called, which prevents us from splitting it in the first place. This looks to be more complicated than I hoped. There is a variable we can check to see if we are in the expand phase, currently_expanding_to_rtl. Perhaps we need to accept any constant if currently_expanding_to_rtl is true, and only accept symbol+highpart (the current code) if it is false. Or alternatively, if this problem only happened at reload time, then checking reload_in_progress and/or reload_completed might work. I will have to look at PR28490 again and try to remember what the original problem that we were fixing was. Jim -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42040
[Bug target/42040] [ia64] Inappropriate address spills
--- Comment #4 from wilson at codesourcery dot com 2010-03-13 03:16 --- Subject: Re: [ia64] Inappropriate address spills Or maybe we should just accept any constant here? I tried that, and for typedef struct table { int a; int b; int c; } table; extern table mv_tables[10]; void foo() { init_mv_table(&mv_tables[5]); } I get addl r35 = @ltoff(mv_tables#+606208), r1 ;; ld8 r35 = [r35] ;; adds r35 = -6208, r35 which is perhaps still better than using @gprel and the constant pool. In this case, in ia64_legitimate_constant_p we can just return true if aligned_offset_symbol_operand is true, and there is no need for the addend local variable. Jim -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42040
[Bug target/42040] [ia64] Inappropriate address spills
--- Comment #2 from wilson at codesourcery dot com 2010-03-13 03:03 --- Subject: Re: [ia64] Inappropriate address spills This broke between gcc-4.0.0 and gcc-4.1.2. It appears to be the patch for PR 28490. There is a test in ia64_legitimate_constant_p for symbol +offset, where we require that the offset have the low 14-bits be zero which is not at all what we intended to test for. This should be a test for an offset that fits into a short add immediate instruction, which takes a signed-14 bit value, aka CONST_OK_FOR_I, or, hmm, let's look that up, I guess that would be satisfies_constraint_I nowadays. Except that satisfies_constraint_I wants an rtx, and we have an integer, so it might be simpler to just substitute in the right arithmetic for the constant check. I'll attach an untested patch that works for the testcase. Jim -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42040
[Bug target/30282] Optimization flag -O1 -fschedule-insns2 cause red zone to be used when there is none
--- Comment #11 from wilson at codesourcery dot com 2010-02-06 06:23 --- Subject: Re: Optimization flag -O1 -fschedule-insns2 cause red zone to be used when there is none On Thu, 2010-02-04 at 10:48 +, rearnsha at gcc dot gnu dot org wrote: > I've been testing the attached patch on ARM (well, thumb) where there's a > similar issue. > Could someone test this on PPC? I tried it on PPC for the two testcases in PR 30282 and PR 42953 and it works for both of them. However, as Andrew Pinski mentioned, this is wrong for some targets, so we would have to make this a target hook to be usable. Meanwhile, I also looked at the thumb testcase, and it looks like a simple matter of emitting a stack_tie insn in thumb1_expand_epilogue before the SP sets are emitted. The ARM port currently only emits the stack_tie insn in the prologue. So we have two possible solutions here: 1) Add a new target hook to control whether sched makes stack pointer sets depend on all MEMs. 2) Emit stack_tie in epilogue always for Thumb and rs6000 ABI_V4, and modify rs6000 stack_tie to use (mem (scratch)) like ARM. I don't feel strongly either way, but considering that the prologue and epilogue code already contain a lot of target dependent magic, I don't see the need for adding more target hooks when all we need is a few small ARM and rs6000 port changes. The second change also makes the RTL self-descriptive. With the first change, we have to make sure that any optimization pass that might move instructions around knows that stack pointers sets are special, and conflict with all MEMs. Jim -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30282
[Bug rtl-optimization/30688] Branch registers loaded too late on ia64
--- Comment #6 from wilson at codesourcery dot com 2009-03-16 19:07 --- Subject: Re: Branch registers loaded too late on ia64 steven at gcc dot gnu dot org wrote: > --- Comment #5 from steven at gcc dot gnu dot org 2009-03-16 08:46 > --- > Can someone point me to the IA64 optimiation manuals mentioned in comment #0? > You can find manuals on the Intel web site. You want the Intel Itanium 2 Processor Reference Manual (For Software Development and Optimization). Chapter 7 talks about branch instructions. > * Which branch registers can I use? Any one of the 8 special branch registers, class BR_REGS. > * What does "as early as possible" mean in comment #0? The manual says there should be several cycles between the branch register write and the branch for correct prediction. There is probably no "too early" to worry about, as long as you don't use more than the available 8 registers. You want to avoid reloads here. Some of the regs are call clobbered, some are preserved, and probably some are reserved for call/return. I don't recall all of the ABI details. You can look them up in the manuals. See the Itanium Software Conventions and Runtime Architecture Guide. > * What happens if a value is assigned to a branch register on IA64? Is the > prefetcher always triggered? What is the latency of the prefetching after a > branch register has been assigned a value? This is complicated. I suggest downloading the docs and reading them. > * Is there a possibility to add a prediction hint to say "branch register A is > more likely to be used than branch register B" when multiple branch registers > are assigned a value in the same basic block? There is separate predication support for each branch register, but I assume this is about priority for prefetching? Yes, there are branch hints for that. See the Itanium Architecture Software Developer's Manual, Volume 1, section 4.5 is for branch instructions. There is a "few" completer for prefetching a few lines, and a "many" completer for prefetching many lines. ia64.md uses "many" for call and return. Jim -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30688