[Bug target/43902] suboptimal MIPS widening multiply accumulate

2010-06-16 Thread wilson at codesourcery dot com


--- Comment #12 from wilson at codesourcery dot com  2010-06-17 04:29 
---
Subject: Re:  suboptimal MIPS widening multiply accumulate

On Wed, 2010-06-16 at 13:29 +, bernds at gcc dot gnu dot org wrote:
> Could you retest the MIPS fixed-point testcases with the obvious fix?  You
> probably have the MIPS toolchain set up already and it would probably take me
> more time.

Unfortunately, the fix is not quite that easy.  We need to handle
saturating types in the optab check.  I already wrote the multiply add
check that way, but the widening-multiply check needs to be fixed. It
turns out that there isn't any obtab for saturating widening multiply,
so we have to add one.  Also, there is also no pattern for it in the
mips-fixed.md file. There is only the saturating widening multiply
accumulate pattern.  I believe I can fix this by adding a multiply
accumulate pattern that adds zero, but now I'm starting to get cascading
patches here.  This could take a few days.

Since I was looking at adding zero, I couldn't help but notice that the
MIPS fixed point support seems incomplete.
_Sat long _Fract  f1 (void) { return 0; }
gives me
j   $31
lw  $2,.LC0
.LC0:
.word   0x0
But this can be split off into a separate bug report.

Jim


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43902



[Bug target/43902] suboptimal MIPS widening multiply accumulate

2010-06-15 Thread wilson at codesourcery dot com


--- Comment #10 from wilson at codesourcery dot com  2010-06-16 06:30 
---
Subject: Re:  suboptimal MIPS widening multiply accumulate

On Wed, 2010-06-09 at 20:21 +, bernds at gcc dot gnu dot org wrote:
> What do you think?  Please let me know what your MIPS tests turned up.

I'm looking at this again.  My MIPS tests showed that my patch fixed 17
gcc.target/mips multiply-accumulate testcases that were accidentally
broken by the new tree level widening multiply optimization pass.  My
new testcase madd-9.c failed, but it turned out that I accidentally
double-patched the file.  Fixing the file, it now passes.  I forgot to
include this testcase in my second patch, though it was there in the
first one.  The bad news is that there are two new failures:
dpaq_sa_l_w.c and dpsq_sa_l_w.c.  These are MIPS DSP fixed-point
multiply-accumulate testcases, which is something I definitely didn't
bother to check earlier.  Overall I would say it is a win, since it
fixes many int/long/long long examples, and only breaks a few obscure
cases that should be fixable with a little more work.

I'm looking at your patch too.  There is a testcase that doesn't appear
to belong, gcc.target/arm/pr42172-1.c.  I'm not sure why
convert_plusminus_to_widen needs to check for MULT and call
convert_mult_to_widen.  If we are scanning basic blocks from top to
bottom, it seems that the multiplies would have already been checked.
But maybe this has something to do with looking at operands computed in
other basic blocks that haven't been scanned yet, in which case it would
make sense.  I'm not sure if that is possible.  Otherwise, it looks like
you have completed and cleanup up a bunch of stuff that I didn't get far
enough to notice.  It all looks good to me.  I can try rerunning MIPS
testcases to see how it works.

I see that the failure of the DSP fixed-point tests is because we have
checks for
+  if (TREE_CODE (type) != INTEGER_TYPE)
+return false;
and the old code in expr.c that we are replacing does
-  if ((TREE_CODE (type) == INTEGER_TYPE
-  || TREE_CODE (type) == FIXED_POINT_TYPE)
So that looks like an easy fix (assuming no other complications), but
will require a rebuild and retest.

Jim


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43902



[Bug target/43902] suboptimal MIPS widening multiply accumulate

2010-06-07 Thread wilson at codesourcery dot com


--- Comment #8 from wilson at codesourcery dot com  2010-06-07 22:18 ---
Subject: Re:  suboptimal MIPS widening multiply accumulate

On Mon, 2010-06-07 at 21:34 +, bernds at gcc dot gnu dot org wrote:
> Jim, are you still working on this or should I pick it up?

I'm working on it, but only a few hours a week, so my progress will be
slow.  If you have more time that would help.  I put a patch in the PR
that seems to be working well, but there were a few minor issues that
turned up during a MIPS testsuite run that I wanted to look at before
formally submitting the patch.

Jim


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43902



[Bug target/43902] suboptimal MIPS widening multiply accumulate

2010-05-03 Thread wilson at codesourcery dot com


--- Comment #3 from wilson at codesourcery dot com  2010-05-03 22:28 ---
Subject: Re:  suboptimal MIPS widening multiply accumulate

On Tue, 2010-04-27 at 09:33 +, rguenth at gcc dot gnu dot org wrote:
> For more general optimization you might want to move all this code to
> the tree pass before expansion that detects widening multiplication.
> The DOT_PROD_EXPR tree code can be used to carry the information to
> the expander.

I didn't understand the point here at first, but after updating my tree
and rebuilding I now see Bernd's patch to add the new widening_mul
optimization pass which I hadn't noticed before.  The MIPS macc/madd
support is now even more broken than before.  I can generate them at -O
but not at -O2 now, as the code in expand_expr_real_2 doesn't handle
WIDEN_MULT_EXPR which is created at -O2 by the new code.

So as Richard Guenther mentioned, the solution seems to be to remove the
old code in expand_expr_real_2 and add something new in
tree-ssa-math-opt.c to replace it.  This means we won't be able to
generate widening multiply accumulate with -O, but we will at -O2.  This
is probably acceptable, though it makes the solution more complicated
than I had originally expected.

Jim


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43902



[Bug debug/43828] Emit debug info allowing inlined functions to show in stack traces

2010-04-23 Thread wilson at codesourcery dot com


--- Comment #3 from wilson at codesourcery dot com  2010-04-23 22:40 ---
Subject: Re:   New: Emit debug info allowing inlined functions
 to show in stack traces

On 04/21/2010 02:26 AM, scovich at gmail dot com wrote:
> It would be very nice if gcc emitted debug information that allowed profilers
> and debuggers the option to extract a stack trace which included calls to
> inlined functions. This would allow developers much greater insight into the
> behavior of optimized code.

Try the -i option of addr2line.

Jim


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43828



[Bug middle-end/43760] [4.6 regression] New test failures

2010-04-19 Thread wilson at codesourcery dot com


--- Comment #4 from wilson at codesourcery dot com  2010-04-20 01:54 ---
Subject: Re:  [4.6 regression] New test failures

On Tue, 2010-04-20 at 00:01 +, sje at cup dot hp dot com wrote:
> Any ideas on how to fix the compiler?  The best idea I could come up with was
> to check each basic block to see if there are any conditional sets of 
> predicate
> registers in it and add pred.rel.mutex lines for those registers.  If we add
> pred.rel.mutex lines for all used predicate registers we wind up adding a lot
> of unneeded pred.rel.mutex statements.

I haven't had a chance to look at all of the details yet.

I believe that we only need to handle the case when one mutex pair is
used to create another one, e.g. something like
   (p17) cmp.geu p6, p7 = r42, r51
   (p16) cmp.gtu p6, p7 = r42, r51
We also only need to handle the case where we have a call-clobbered
mutex pair, e.g. p6/p7 in this case, and there was a call insn in
between here and any previous set of p6/p7.

Currently, we only insert pred.rel.mutex lines at the beginning of basic
blocks that start with labels.  So now we also need to emit them after
such sequences, or alternatively after calls.

I'm not sure if the assembler is wrong here.  mutex just means that we
will never have both p16 and p17 set at the same time.  So there are 3
possibilities, p16 true, p17 true, neither true.  If p16 is true, then
clearly p6/p7 will be mutex.  Likewise if p17 is true.  But if neither
is true, then p6/p7 will be unmodified, and hence aren't guaranteed to
be mutex unless they were already mutex before these lines.  This is why
it only fails when we have a call-clobbered mutex pair after an
intervening call, as now we don't know anything about the previous
values of p6/p7.

If we know that p16/p17 were set in a simple compare, then we will know
that 1 and only 1 will be true, which is a stronger condition than
mutex, and in this case p6/p7 will be mutex.  There is no
standard .pred.rel syntax for this though, which means that if the
p16/p17 set was in a different basic block, and we have only
the .pred.rel.mutex line inserted by the compiler, then the assembler
can't prove that p6/p7 will be mutex after this point.  I'm not eager to
extend the assembler syntax to fix this, and rewrite the DV code in the
assembler, so I'm hoping that we can fix it in the compiler by emitting
another .pred.rel.mutex line in this case.  This hopefully doesn't
happen too often and isn't too hard.

OK, maybe I have studied it a little, but I need to look at this a bit
more before I'm sure I understand everything that is going on here.

Jim


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43760



[Bug driver/43687] Unexpected error message for bad command line argument

2010-04-09 Thread wilson at codesourcery dot com


--- Comment #5 from wilson at codesourcery dot com  2010-04-09 22:16 ---
Subject: Re:  Unexpected error message for bad command line
 argument

On 04/09/2010 02:34 PM, wilson at gcc dot gnu dot org wrote:
> POSIX says that for command line arguments "-a -d", "-d -a", "-da", and "-ad"
> are all equivalent.

Sorry, this is an overstatement.  I didn't mean to imply that option 
order doesn't matter.  So I should have said that "-d -a" and "-da" are 
equivalent, which is not true in gcc.

Jim


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43687



[Bug rtl-optimization/43520] gcc.dg/pr43058.c uses way too memory on ia64

2010-03-25 Thread wilson at codesourcery dot com


--- Comment #3 from wilson at codesourcery dot com  2010-03-25 23:21 ---
Subject: Re:  gcc.dg/pr43058.c uses way too
 memory on ia64

On Thu, 2010-03-25 at 19:05 +, pinskia at gcc dot gnu dot org wrote:
> Did you test earlier versions of GCC with the testcase?

I hadn't gotten around to it yet.  I am at Cisco Tue-Thu, and I can't do
much IA-64 work these days.

It compiles OK with gcc-4.4.

> Also it should not hard to figure out where in the scheduler the memory is
> being used.  The kernel panic is not our fault so make sure the kernel guys 
> get
> a bug report also; the memory usage is though.

Another thing I hadn't gotten around to yet.  The problem appears to be
Vlad's -fsched-pressure stuff.  reg_last->implicit_sets has about 5000
instructions on the list.  If we have 5000 insns each of which is
getting a dependency on each other, that is 25M dependencies, at 40
bytes each, which is 1GB, not counting other overhead.  There is also
memory being allocated somewhere else, as this testcase needs more than
3GB to be compiled.  It appears that these lists need some kind of
throttling to prevent them from getting too big.  Or maybe they aren't
being constructed correctly for IA-64. -fno-sched-pressure doesn't help,
as that does not disable creation of the implicit register sets.  I can
spend more time looking at this later.

Everybody knows the linux kernel oom killer has problems, and that there
are no easy solutions.  I don't need to report that.

Jim


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43520



[Bug target/42040] [ia64] Inappropriate address spills

2010-03-22 Thread wilson at codesourcery dot com


--- Comment #13 from wilson at codesourcery dot com  2010-03-23 02:11 
---
Subject: Re:  [ia64] Inappropriate address spills

On Mon, 2010-03-22 at 20:48 +, sje at cup dot hp dot com wrote:
> Since the proposed patch to meant to address non-optimimal code generation I
> decided to try the patch with SPEC2006 and see if it helped the performance.

This isn't a simple issue, performance wise.

If the compiler puts an address+offset into sdata, then we get something
like
.sdata
.align 8
.LC2:
data8   mv_tables#+72
.text
addl r14 = @gprel(.LC2), gp
;;  
ld8 r40 = [r14]

If the compiler emits the code that the ABI intended here, we get
something like
addl r14 = @ltoff(mv_tables#), r1
;;  
ld8 r14 = [r14]
;;  
adds r40 = 72, r14
This sequence is one instruction longer.

We deliberately do this to avoid accidentally overflowing the got.  If
you access the same array with a thousand different offsets, then
instead of putting a thousand entries into the got, we put one entry in
the got, and then add in the offset.  So this is a smaller got, but an
extra add instruction that may add to latency if we can't hide it.

This code is working for got entries, but not when an address is forced
into constant pool.  So which one is better depends on what the code
looks like.  If you have a small number of variables and a large number
of offsets, then you will get a lot more memory references and a larger
got with the first alternative.  If you have a large number of variables
and a small number of offsets, then you will get more add instructions
with the second alternative.

This issue originally came up via PR 41567, which triggers a linker
error.  Perhaps that testcase would not have triggered the linker error
if we were using the second alternative.  I haven't looked at this PR as
yet, so I don't know the details of what is going on here.

Jim


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42040



[Bug target/42040] [ia64] Inappropriate address spills

2010-03-17 Thread wilson at codesourcery dot com


--- Comment #11 from wilson at codesourcery dot com  2010-03-18 00:12 
---
Subject: Re:  [ia64] Inappropriate address spills

On Wed, 2010-03-17 at 23:47 +, sje at cup dot hp dot com wrote:
> Reading Richard's initial comment I thought the problem was that the code was
> (or could be) illegal because the relocation may be out of range and we
> shouldn't
> use the gprel relocation for any of these constant pool references.

The code is invalid only if you use -mno-sdata.

The only reason why Richard wanted to use -mno-sdata is because he saw
the non-optimal code, and hoped that -mno-sdata would fix that.  It
didn't.  It just made the problem worse by generating invalid code.  So
there is probably also a bug here in the handling of -mno-sdata, but
fixing that doesn't help Richard, because it doesn't solve the real
problem he was complaining about, which was the non-optimal code.

I suspect that the -mno-sdata bug is that ia64_expand_load_address does
  else if (sdata_symbolic_operand (src, VOIDmode))
emit_insn (gen_load_gprel (dest, src));
and sdata_symbolic_operand fails to check TARGET_NO_SDATA, so it assumes
that constant pool entries are still in the sdata section, and we end up
with gprel references to items that aren't in the sdata section anymore,
which will fail at link time.  If we fix that, -mno-sdata will work, but
we will still have all of the inappropriate address spills (i.e.
non-optimal code) which is what the PR summary is complaining about.

Jim


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42040



[Bug target/42040] [ia64] Inappropriate address spills

2010-03-17 Thread wilson at codesourcery dot com


--- Comment #9 from wilson at codesourcery dot com  2010-03-17 23:25 ---
Subject: Re:  [ia64] Inappropriate address spills

On Wed, 2010-03-17 at 22:09 +, sje at cup dot hp dot com wrote:
> I tried the patch and didn't have any problem bootstrapping and I didn't see
> any regressions.  It also fixed my small test case, but when I went back and
> tried some of the other tests from PR 28490 I still saw some of the bad gprel
> usages.  Here is a slightly more cutdown testcase from 28490 that still fails
> with the patch applied and when compiling with -O2.

I already mentioned in my previous message that the patch does not
eliminate all instances of the gprel usages (constant-pool references).
It just eliminates most of them.  If you want to eliminate all of them,
then you need to get LEGITIMATE_PIC_OPERAND_P working, which in turn
will require setting flag_pic by default. which in turn will require
verifying that this doesn't break anything.  That is probably more work
than I have time for.

And by "fails", you mean that the compiler is still emitting undesirable
code in some cases.  The code isn't incorrect, just non-optimal.  And we
get less non-optimal code with the patch, so it seems to be better than
nothing.  Unless you want to try to write the better
LEGITIMATE_PIC_OPERAND_P solution.

Jim


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42040



[Bug target/43348] [4.4 regression] ICE in final_scan_insn, at final.c:2604

2010-03-16 Thread wilson at codesourcery dot com


--- Comment #6 from wilson at codesourcery dot com  2010-03-17 01:46 ---
Subject: Re:   New: [4.4 regression] ICE in final_scan_insn,
 at final.c:2604

The testcase has
>if (newCapacity > std::numeric_limits::max() / sizeof(T))
> do { *(int *)(uintptr_t)0xbbadbeef = 0; ((void(*)())0)(); } 
> while(false);
which is not very C++ish, but none the less should "work".

The problem is in the ia64.md file.  The call patterns use the predicate 
call_operand which accepts only symbol_ref and register, and constraints 
that include a register "b" and "i".  The call operand remains a 
register until reload, which sees that we have a register equivalent to 
zero and an "i" constraint and does the substitution.  After reload, we 
then fail, as call_operand does not accept const_int.

We need to fix the ia64.md call patterns to use "s" instead of "i". 
This bug has apparently been present since the port was submitted in 
gcc-3.0.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43348



[Bug target/42040] [ia64] Inappropriate address spills

2010-03-13 Thread wilson at codesourcery dot com


--- Comment #5 from wilson at codesourcery dot com  2010-03-13 08:23 ---
Subject: Re:  [ia64] Inappropriate address spills

On third thought...
The code here makes sense if we were having problems with invalid
constant recombinations.  symbol+const gets split by ia64_expand_move
into symbol+highpart which is put in the got, and lowpart which is added
with adds.  If a late optimization pass saw a REG_EQUAL note and tried
to put the constant back together again, that would be inconvenient.
The ia64_legitimate_constant_p code would prevent that.  Unfortunately,
this code also prevents us to accepting a constant before
ia64_expand_move is called, which prevents us from splitting it in the
first place.  This looks to be more complicated than I hoped.

There is a variable we can check to see if we are in the expand phase,
currently_expanding_to_rtl.  Perhaps we need to accept any constant if
currently_expanding_to_rtl is true, and only accept symbol+highpart (the
current code) if it is false.  Or alternatively, if this problem only
happened at reload time, then checking reload_in_progress and/or
reload_completed might work.

I will have to look at PR28490 again and try to remember what the
original problem that we were fixing was.

Jim


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42040



[Bug target/42040] [ia64] Inappropriate address spills

2010-03-12 Thread wilson at codesourcery dot com


--- Comment #4 from wilson at codesourcery dot com  2010-03-13 03:16 ---
Subject: Re:  [ia64] Inappropriate address spills

Or maybe we should just accept any constant here?  I tried that, and for

typedef struct table { int a; int b; int c; } table;
extern table mv_tables[10];
void foo()
{
  init_mv_table(&mv_tables[5]);
}

I get
addl r35 = @ltoff(mv_tables#+606208), r1
;;
ld8 r35 = [r35]
;;
adds r35 = -6208, r35
which is perhaps still better than using @gprel and the constant pool.

In this case, in ia64_legitimate_constant_p we can just return true if
aligned_offset_symbol_operand is true, and there is no need for the
addend local variable.

Jim


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42040



[Bug target/42040] [ia64] Inappropriate address spills

2010-03-12 Thread wilson at codesourcery dot com


--- Comment #2 from wilson at codesourcery dot com  2010-03-13 03:03 ---
Subject: Re:  [ia64] Inappropriate address spills

This broke between gcc-4.0.0 and gcc-4.1.2.  It appears to be the patch
for PR 28490.  There is a test in ia64_legitimate_constant_p for symbol
+offset, where we require that the offset have the low 14-bits be zero
which is not at all what we intended to test for.  This should be a test
for an offset that fits into a short add immediate instruction, which
takes a signed-14 bit value, aka CONST_OK_FOR_I, or, hmm, let's look
that up, I guess that would be satisfies_constraint_I nowadays.  Except
that satisfies_constraint_I wants an rtx, and we have an integer, so it
might be simpler to just substitute in the right arithmetic for the
constant check.  I'll attach an untested patch that works for the
testcase.

Jim


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42040



[Bug target/30282] Optimization flag -O1 -fschedule-insns2 cause red zone to be used when there is none

2010-02-05 Thread wilson at codesourcery dot com


--- Comment #11 from wilson at codesourcery dot com  2010-02-06 06:23 
---
Subject: Re:  Optimization flag -O1 -fschedule-insns2
 cause red zone to be used when there is none

On Thu, 2010-02-04 at 10:48 +, rearnsha at gcc dot gnu dot org
wrote:
> I've been testing the attached patch on ARM (well, thumb) where there's a
> similar issue. 
> Could someone test this on PPC?

I tried it on PPC for the two testcases in PR 30282 and PR 42953 and it
works for both of them.  However, as Andrew Pinski mentioned, this is
wrong for some targets, so we would have to make this a target hook to
be usable.

Meanwhile, I also looked at the thumb testcase, and it looks like a
simple matter of emitting a stack_tie insn in thumb1_expand_epilogue
before the SP sets are emitted.  The ARM port currently only emits the
stack_tie insn in the prologue.

So we have two possible solutions here:
1) Add a new target hook to control whether sched makes stack pointer
sets depend on all MEMs.
2) Emit stack_tie in epilogue always for Thumb and rs6000 ABI_V4, and
modify rs6000 stack_tie to use (mem (scratch)) like ARM.

I don't feel strongly either way, but considering that the prologue and
epilogue code already contain a lot of target dependent magic, I don't
see the need for adding more target hooks when all we need is a few
small ARM and rs6000 port changes.  The second change also makes the RTL
self-descriptive.  With the first change, we have to make sure that any
optimization pass that might move instructions around knows that stack
pointers sets are special, and conflict with all MEMs.

Jim


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30282



[Bug rtl-optimization/30688] Branch registers loaded too late on ia64

2009-03-16 Thread wilson at codesourcery dot com


--- Comment #6 from wilson at codesourcery dot com  2009-03-16 19:07 ---
Subject: Re:  Branch registers loaded too late
 on ia64

steven at gcc dot gnu dot org wrote:
> --- Comment #5 from steven at gcc dot gnu dot org  2009-03-16 08:46 
> ---
> Can someone point me to the IA64 optimiation manuals mentioned in comment #0? 
>  

You can find manuals on the Intel web site.  You want the Intel Itanium 
2 Processor Reference Manual (For Software Development and 
Optimization).  Chapter 7 talks about branch instructions.

> * Which branch registers can I use?

Any one of the 8 special branch registers, class BR_REGS.

> * What does "as early as possible" mean in comment #0?

The manual says there should be several cycles between the branch 
register write and the branch for correct prediction.  There is probably 
no "too early" to worry about, as long as you don't use more than the 
available 8 registers.  You want to avoid reloads here.  Some of the 
regs are call clobbered, some are preserved, and probably some are 
reserved for call/return.  I don't recall all of the ABI details.  You 
can look them up in the manuals.  See the Itanium Software Conventions 
and Runtime Architecture Guide.

> * What happens if a value is assigned to a branch register on IA64?  Is the
> prefetcher always triggered?  What is the latency of the prefetching after a
> branch register has been assigned a value?

This is complicated.  I suggest downloading the docs and reading them.

> * Is there a possibility to add a prediction hint to say "branch register A is
> more likely to be used than branch register B" when multiple branch registers
> are assigned a value in the same basic block?

There is separate predication support for each branch register, but I 
assume this is about priority for prefetching?  Yes, there are branch 
hints for that.  See the Itanium Architecture Software Developer's 
Manual, Volume 1, section 4.5 is for branch instructions.  There is a 
"few" completer for prefetching a few lines, and a "many" completer for 
prefetching many lines.  ia64.md uses "many" for call and return.

Jim


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30688