gcc-4.4-20120228 is now available

2012-02-28 Thread gccadmin
Snapshot gcc-4.4-20120228 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20120228/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.4 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch 
revision 184643

You'll find:

 gcc-4.4-20120228.tar.bz2 Complete GCC

  MD5=e987d88e53ecd256f110a3148523c2bc
  SHA1=38e52b168b1b60664fdd4d85a3ad4665f75b1a75

Diffs from 4.4-20120221 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.4
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: random commentary on -fsplit-stack (and a bug report)

2012-02-28 Thread Jay Freeman (saurik)
> > "Jay Freeman (saurik)" 
> "Ian Lance Taylor" 

> Thanks for the bug report and the analysis.  I think it does simply
> require an '&'.  That makes it analogous to the way
> __morestack_release_segments is used in generic-morestack-thread.c. 

The only reason I hesitated on that is that it might not make sense to update 
the pointer in the context. In my specific case, that will actually cause it to 
crash ;P, as while the current stack I'm calling __splitstack_releasecontext 
from is valid, the context pointer I'm passing is actually stored on the old 
stack, and will be unallocated by __morestack_releasse_segments.

I can always just change my code to copy the context to the other stack before 
calling __splitstack_releasecontext, however, so that isn't a problem for me. 
Though, I also wasn't certain what the releasecontext function actually wanted 
to do with that pointer, as I hadn't yet read much of the morestack code; I now 
see that it is just the head of a linked list, so yeah: passing the address out 
of the context seems fine.

> As you know, I wanted to allow for future expansion.  I agree that it
> would be possible to avoid storing MORESTACK_SEGMENTS--that would trade
> off space for time, since it would mean that setcontext would have to
> walk up the list.  I think CURRENT_STACK is required for
> __splitstack_find_context.  And __splitstack_find_context is required
> for Go's garbage collector.  At least, it's not obvious to me how to
> avoid requiring CURRENT_STACK for that case.

The basis of that suggestion was not just that the items in the context could 
be removed, but that the underlying state used by split stacks might not need 
the values at all. In this case, I am not certain why __morestack_segments is 
needed: it seems to only come in to play when __morestack_current_segment is 
NULL (and I'm not certain how that would happen) and while deallocating dynamic 
blocks (which is already linear).

I might provide a patch to better describe what I mean by this. I've started 
the process of getting a copyright assignment in place (sent an e-mail to 
fsf-reco...@gnu.org per http://gcc.gnu.org/wiki/CopyrightAssignment).

> I agree.  Want to write a patch?  Or at least file a bug report.

Sure.

> [paragraph moved below]

> > 7) Using the linker to handle the transition between split-stack and
> > non-split-stack code seems like a good way to solve the problem of "we
> > need large stacks when hitting external code", but in staring at the
> > resulting code I have in my project I'm seeing that it isn't reliable:
> > if you have a pointer to a function the linker will not know what you
> > are calling. In my case, this is coming up often due to using
> > std::function.
> 
> Yes, good point.  I think I had some plan for handling that but I no
> longer recall what it was.

After getting more sleep, I realize that this problem is actually much more 
endemic than I had even previously thought. Most any vaguely object-oriented 
library is going to have tons of function pointers in it, and you often 
interact solely with those function pointers (as in, you have no actual symbol 
references anywhere). A simple example: in the case of C++, any call to a 
non-split-stack virtual function will fail.

"""Function pointers are a tricky case. In general we don't know whether a 
function pointer points to split-stack code. Therefore, all calls through a 
function pointer will be modified to call (or jump to) a special function 
__fnptr_morestack. This will use a target specific function calling sequence, 
and will be implemented as though it were itself a function call instruction. 
That is, all the parameters will be set up, and then the code will jump to 
__fnptr_morestack. The __fnptr_morestack function takes two parameters: the 
function pointer to call, and the number of bytes of arguments pushed on the 
stack. (This is not yet implemented.)"""

That paragraph is from your design document (SplitStacks on the GCC wiki). I 
presume that this solution would only work if __fnptr_morestack always assumed 
that the target did not support split-stack? Alternatively, I can see having 
that stub look at the function to see if its first instruction was a comparison 
to the TCB stack limit entry (using similar logic to that used by the linker)? 
[also, see below in this e-mail]

> > More awkwardly, split-stack functions that mention (but do not call)
> > non-split-stack functions (such as to return their address) are being
> > mis-flagged by the linker. Honestly, I question whether the linker
> > fundamentally has enough information about what is going on to be able
> > to make sufficiently accurate decisions with regards to stack
> > constraints to warrant the painful abstraction breakage that
> > split-stack uses. :(
> 
> Your're right that the linker doesn't really have enough information.
> But is a split-stack function that returns the address of a
> non-split-stack function really so frequent that it

Re: A question about redundant PHI expression stmt

2012-02-28 Thread William J. Schmidt


On Tue, 2012-02-28 at 11:52 -0600, William J. Schmidt wrote:
> On Tue, 2012-02-28 at 11:03 -0600, William J. Schmidt wrote:
> 
> > I think this is probably a problem with how cprop_into_successor_phis
> > works.  It only propagates into immediate successors of a block.  In
> > this case copies are propagated from bb12 into phis in bb13 and bb14 (of
> > which there are none).  Seems like it could propagate into phis of all
> > dominator children, which would catch bb15.  I haven't looked at this
> > part of the code in too much detail, but I'm guessing since bb15's
> > immediate dominator is bb12, this is the last chance to get the value
> > propagated into the phi.
> > 
> > When I get more time, I'll walk through the logic for this test to
> > confirm, and post here with the results.
> > 
> > If this is the problem, then I doubt anything will be fixed for this in
> > the limited time left for 4.7.  You'd want to open a missed-optimization
> > PR for it.
> 
> No, this isn't right.  I see in the dump that the copy is being removed
> from the const_and_copies_table after processing block 13 and before
> processing block 14.  That's presumably a bug.  I'll keep looking.
> 

OK, I see the problem.  We have a long-standing bug in dom's use of edge
threading, which quietly corrupts the const_and_copy stack.  I'll create
a bug report and post a patch for you to try out.  My testing shows that
the redundant phi is properly handled now, but let's see if it solves
your performance issue.

Thanks,
Bill

> Bill



Re: random commentary on -fsplit-stack (and a bug report)

2012-02-28 Thread Ian Lance Taylor
"Jay Freeman (saurik)"  writes:

> As demonstrated by these snippets, __morestack_segments is a pointer
> to a stack_segment; it is being stored in the context as a void *, but
> is being removed from the context and being passed directly to
> __morestack_release_segments, which in turn expects a pointer to a
> pointer to a stack_segment, not just a bare pointer to a stack
> segment. Probably quite simple to fix (although might be more complex
> than just "add a &").

Thanks for the bug report and the analysis.  I think it does simply
require an '&'.  That makes it analogous to the way
__morestack_release_segments is used in generic-morestack-thread.c.  So,
fixed with the appended patch.  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline.


> 1) The current implementation (maybe this is intended to change?) uses
> mmap() to allocate stack segments, which means that every allocation
> involves a system call, a lock in the kernel on a slow data structure
> (anon_vma), and has some non-zero probability of ending up with a
> separate VMA (which is not only slow, but in my understanding uses up
> a limited resource: you can only have 64k VMAs per process).
>
> Is it possible to instead expose the functionality for allocating
> stack segments out of libgcc for easy replacement by coroutine
> runtimes? I would really love to be able to use my existing memory
> manager to allocate the stack segments. I realize that this allocation
> routine would need to be able to operate with almost no stack: that
> isn't a problem (I can always pivot to another stack if I need any
> stack).

Makes sense to me.  It could actually be used by Go as well, since libgo
has its own memory allocator.  Probably the best approach would be to
add another function, __splitstack_set_allocator or something, which
would set the allocator function.  The allocator would have to be
updated atomically, of course.  And, as you say, it would have to
operate with limited stack space.


> 2) I had seen a discussion on the mailing list regarding argument
> copying, and I must say I'm somewhat confused as to why it is
> sufficient to memcpy the arguments from the old stack to the new one:
> if I have an argument with a non-POD type that has a non-trivial copy
> constructor, it would seem like I need a copy operation to be able to
> use the object from the new stack (maybe, for example, it has an
> internal pointer).

Non-POD values are always passed by pointer.  They are not copied.  See
pass_by_reference in function.c.  Basically, the frontend is responsible
for calling the copy constructor where required.  This is because the
middle-end may want to shuffle the value around in various ways, and we
don't want it to call the copy constructor multiple times while doing
that.


> 3) If I have either blocked signals on my thread or have setup an
> alternate signal stack with sigaltstack, I can get away with
> super-tiny stacks. However, allocate_segment has a minimum stack size
> of MINSIGSTKSZ (I presume to allow for signals), which on some systems
> I use (such as Mac OS X) I've seen be set as high as 32kB. (Meanwhile,
> MINSIGSTKSZ on Linux is smaller than a page, so mmap() can't even
> allocate it.)

Makes sense to me.  There should be a way for the program to specify the
minimum stack segment size.


> 4) 10 64-bit words for the splitstack context is a really large amount
> of space. :( I don't even have that much CPU-state (there are only 8
> registers that really need to be saved when switching between
> coroutines). Considering the stack segments form a doubly-linked-list,
> it would seem like MORESTACK_SEGMENTS and CURRENT_SEGMENT are
> redundant. I also feel like CURRENT_STACK could be worked around
> fairly well.

As you know, I wanted to allow for future expansion.  I agree that it
would be possible to avoid storing MORESTACK_SEGMENTS--that would trade
off space for time, since it would mean that setcontext would have to
walk up the list.  I think CURRENT_STACK is required for
__splitstack_find_context.  And __splitstack_find_context is required
for Go's garbage collector.  At least, it's not obvious to me how to
avoid requiring CURRENT_STACK for that case.

Not sure what to do here.  Obviously you could cheat and use 7 words
instead of 10.


> 5) As currently implemented, the stack space check is added to every
> single function. However, some functions do not actually use the stack
> (or might even be avoiding memory accesses entirely). When I look at
> the disassembly of my project, I see many references to __morestack
> and "cmp %fs:0x70,%rsp" in functions that would otherwise be just a
> few instructions long. Functions that don't use stack should avoid the
> check.

I agree.  Want to write a patch?  Or at least file a bug report.


> 6) I have noticed that the purpose of having split stacks seems
> largely hobbled by the way the linker enforces humungous stacks on
> outgoing calls to non-split-stack cod

Re: A question about redundant PHI expression stmt

2012-02-28 Thread William J. Schmidt
On Tue, 2012-02-28 at 11:03 -0600, William J. Schmidt wrote:

> I think this is probably a problem with how cprop_into_successor_phis
> works.  It only propagates into immediate successors of a block.  In
> this case copies are propagated from bb12 into phis in bb13 and bb14 (of
> which there are none).  Seems like it could propagate into phis of all
> dominator children, which would catch bb15.  I haven't looked at this
> part of the code in too much detail, but I'm guessing since bb15's
> immediate dominator is bb12, this is the last chance to get the value
> propagated into the phi.
> 
> When I get more time, I'll walk through the logic for this test to
> confirm, and post here with the results.
> 
> If this is the problem, then I doubt anything will be fixed for this in
> the limited time left for 4.7.  You'd want to open a missed-optimization
> PR for it.

No, this isn't right.  I see in the dump that the copy is being removed
from the const_and_copies_table after processing block 13 and before
processing block 14.  That's presumably a bug.  I'll keep looking.

Bill



Re: A question about redundant PHI expression stmt

2012-02-28 Thread William J. Schmidt
On Tue, 2012-02-28 at 11:21 +0100, Richard Guenther wrote:
> On Tue, Feb 28, 2012 at 9:33 AM, Jiangning Liu  wrote:
> >
> >
> >> -Original Message-
> >> From: Jiangning Liu
> >> Sent: Tuesday, February 28, 2012 4:07 PM
> >> To: Jiangning Liu; 'William J. Schmidt'
> >> Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org
> >> Subject: RE: A question about redundant PHI expression stmt
> >>
> >>
> >>
> >> > -Original Message-
> >> > From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf
> >> Of
> >> > Jiangning Liu
> >> > Sent: Tuesday, February 28, 2012 11:19 AM
> >> > To: 'William J. Schmidt'
> >> > Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org
> >> > Subject: RE: A question about redundant PHI expression stmt
> >> >
> >> >
> >> >
> >> > > -Original Message-
> >> > > From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On
> >> Behalf
> >> > Of
> >> > > William J. Schmidt
> >> > > Sent: Monday, February 27, 2012 11:32 PM
> >> > > To: Jiangning Liu
> >> > > Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org
> >> > > Subject: Re: A question about redundant PHI expression stmt
> >> > >
> >> > > On Fri, 2012-02-24 at 16:07 +0800, Jiangning Liu wrote:
> >> > > > Hi,
> >> > > >
> >> > > > For the small case below, there are some redundant PHI expression
> >> > > stmt
> >> > > > generated, and finally cause back-end generates redundant copy
> >> > > instructions
> >> > > > due to some reasons around IRA.
> >> > > >
> >> > > > int *l, *r, *g;
> >> > > > void test_func(int n)
> >> > > > {
> >> > > > int i;
> >> > > > static int j;
> >> > > > static int pos, direction, direction_pre;
> >> > > >
> >> > > > pos = 0;
> >> > > > direction = 1;
> >> > > >
> >> > > > for ( i = 0; i < n; i++ )
> >> > > > {
> >> > > > direction_pre = direction;
> >> > > >
> >> > > > for ( j = 0; j <= 400; j++ )
> >> > > > {
> >> > > >
> >> > > > if ( g[pos] == 200 )
> >> > > > break;
> >> > > >
> >> > > > if ( direction == 0 )
> >> > > > pos = l[pos];
> >> > > > else
> >> > > > pos = r[pos];
> >> > > >
> >> > > > if ( pos == -1 )
> >> > > > {
> >> > > > if ( direction_pre != direction )
> >> > > > break;
> >> > > >
> >> > > > pos = 0;
> >> > > > direction = !direction;
> >> > > > }
> >> > > >
> >> > > > }
> >> > > >
> >> > > > f(g[pos]);
> >> > > > }
> >> > > > }
> >> > > >
> >> > > > I know PR39976 has something to do with this case, and check-in
> >> > > r182140
> >> > > > caused big degradation on the real benchmark, but I'm still
> >> > confusing
> >> > > around
> >> > > > this issue.
> >> > > >
> >> > > > The procedure is like this,
> >> > > >
> >> > > > Loop store motion generates code below,
> >> > > >
> >> > > > :
> >> > > >   D.4084_17 = l.4_13 + D.4077_70;
> >> > > >   pos.5_18 = *D.4084_17;
> >> > > >   pos_lsm.34_103 = pos.5_18;
> >> > > >   goto ;
> >> > > >
> >> > > > :
> >> > > >   D.4088_23 = r.6_19 + D.4077_70;
> >> > > >   pos.7_24 = *D.4088_23;
> >> > > >   pos_lsm.34_104 = pos.7_24;
> >> > > >
> >> > > > :
> >> > > >   # prephitmp.29_89 = PHI 
> >> > > >   # pos_lsm.34_53 = PHI 
> >> > > >   pos.2_25 = prephitmp.29_89;
> >> > > >   if (pos.2_25 == -1)
> >> > > > goto ;
> >> > > >   else
> >> > > > goto ;
> >> > > >
> >> > > > And then, copy propagation transforms block 8 it into
> >> > > >
> >> > > > :
> >> > > >   # prephitmp.29_89 = PHI 
> >> > > >   # pos_lsm.34_53 = PHI 
> >> > > >   ...
> >> > > >
> >> > > > And then, these two duplicated PHI stmts have been kept until
> >> > expand
> >> > > pass.
> >> > > >
> >> > > > In dom2, a stmt like below
> >> > > >
> >> > > >   # pos_lsm.34_54 = PHI 
> >> > > >
> >> > > > is transformed into
> >> > > >
> >> > > >   # pos_lsm.34_54 = PHI 
> >> > > >
> >> > > > just because they are equal.
> >> > > >
> >> > > > Unfortunately, this transformation changed back-end behavior to
> >> > > generate
> >> > > > redundant copy instructions and hurt performance. This case is
> >> from
> >> > a
> >> > > real
> >> > > > benchmark and hurt performance a lot.
> >> > > >
> >> > > > Does the fix in r182140 intend to totally clean up this kind of
> >> > > redundancy?
> >> > > > Where should we get it fixed?
> >> > > >
> >> > >
> >> > > Hi, sorry not to have responded sooner -- I just now got some time
> >> to
> >> > > look at this.
> >> > >
> >> > > I compiled your code with -O3 for revisions 182139 and 182140 of
> >> GCC
> >> > > trunk, and found no difference between the code produced by the
> >> > middle
> >> > > end for the two versions.  So something el

Re: [ARM] EABI and the default to short enums

2012-02-28 Thread Sebastian Huber

On 02/27/2012 10:33 PM, Daniel Jacobowitz wrote:

Sorry for being late to the party.

On Wed, Feb 15, 2012 at 9:55 AM, Ian Lance Taylor  wrote:

Ouch, I did not know that the EABI left this open.  That seems like a
bug, because it prevents code from being interoperable.  This is
precisely the kind of thing an ABI should address.  Does anybody know
why they did this?


It's a matter of platform variants.  There are a sufficient number of
ARM use cases where the extra bytes matter (or else, there are a
sufficient number of ARM vendors / customers who feel that it
matters).  But there's also cases like Linux where the advantages of
int-sized enums outweigh the space cost.  So the platform ABI
supplement is supposed to decide.

I believe that the Linux variant has other deviations from base than
just this.  The one I remember in particular is TLS models but there
may be others.  Please check the full range of differences before you
decide which would be a better base for RTEMS.



Thanks for the comment.  I just figured out, that in GCC 4.7 
ARM_ABI_AAPCS_LINUX is used for other stuff (e.g. Linux kernel support for 
atomic operations) and not only the enums.  Newlib makes also problems (some 
files of libc are compilied with -fshort-enums).  All in all its probably 
better to fix the XDR library and other things (the assumption that an enum is 
an int is quite common).


--
Sebastian Huber, embedded brains GmbH

Address : Obere Lagerstr. 30, D-82178 Puchheim, Germany
Phone   : +49 89 18 90 80 79-6
Fax : +49 89 18 90 80 79-9
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.


Re: [WIP PATCH] Re: Inefficient end-of-loop value computation - missed optimization somewhere?

2012-02-28 Thread Richard Guenther
On Tue, Feb 28, 2012 at 4:10 PM, Ulrich Weigand  wrote:
> Richard Guenther wrote:
>> On Mon, Feb 20, 2012 at 11:19 PM, Ulrich Weigand  wrote:
>> > we've noticed that the loop optimizer sometimes inserts weirdly
>> > inefficient code to compute the value of an induction variable
>> > at the end of the loop.
> [snip]
>
>> The issue is that (start + 1) + 1 * (int) ... is computed using signed
>> integer arithmetic which may invoke undefined behavior when it wraps.
>> Thus we cannot re-associate it.  I suppose if you'd arrange the code
>> to do the arithmetic in unsigned and only cast the result back to the
>> desired type we might fold it ...
>
> I'm not completely sure I've got the correct place for the conversions,
> but using the patch below does indeed fix the inefficient code.
>
> I'll still need to do proper testing and benchmarking, but I thought
> I'd post the patch anyway just as a heads-up ...
>
> Does this look reasonable to you?

Yes, that looks reasonable.  Though instead of unsigned_type_for
please use build_nonstandard_integer_type instead (if the type
is not already unsigned).  unsigned_type_for does not necessarily
preserve type-precision and may even return NULL.

Richard.

> Thanks,
> Ulrich
>
>
> ChangeLog:
>
>        * tree-scalar-evolution.c (compute_overall_effect_of_inner_loop):
>        Perform chrec_apply computation in an unsigned type.
>
> Index: gcc/tree-scalar-evolution.c
> ===
> --- gcc/tree-scalar-evolution.c (revision 184439)
> +++ gcc/tree-scalar-evolution.c (working copy)
> @@ -474,11 +474,18 @@
>            return chrec_dont_know;
>          else
>            {
> +             tree type = TREE_TYPE (evolution_fn);
> +             tree utype = unsigned_type_for (type);
>              tree res;
>
>              /* evolution_fn is the evolution function in LOOP.  Get
> -                its value in the nb_iter-th iteration.  */
> -             res = chrec_apply (inner_loop->num, evolution_fn, nb_iter);
> +                its value in the nb_iter-th iteration.  Since the
> +                number of iterations is always unsigned, we perform
> +                the computation in an unsigned type to allow for
> +                improved simplification of subexpressions.  */
> +             res = chrec_convert (utype, evolution_fn, NULL);
> +             res = chrec_apply (inner_loop->num, res, nb_iter);
> +             res = chrec_convert (type, res, NULL);
>
>              if (chrec_contains_symbols_defined_in_loop (res, loop->num))
>                res = instantiate_parameters (loop, res);
>
> --
>  Dr. Ulrich Weigand
>  GNU Toolchain for Linux on System z and Cell BE
>  ulrich.weig...@de.ibm.com
>


[WIP PATCH] Re: Inefficient end-of-loop value computation - missed optimization somewhere?

2012-02-28 Thread Ulrich Weigand
Richard Guenther wrote:
> On Mon, Feb 20, 2012 at 11:19 PM, Ulrich Weigand  wrote:
> > we've noticed that the loop optimizer sometimes inserts weirdly
> > inefficient code to compute the value of an induction variable
> > at the end of the loop.
[snip]

> The issue is that (start + 1) + 1 * (int) ... is computed using signed
> integer arithmetic which may invoke undefined behavior when it wraps.
> Thus we cannot re-associate it.  I suppose if you'd arrange the code
> to do the arithmetic in unsigned and only cast the result back to the
> desired type we might fold it ...

I'm not completely sure I've got the correct place for the conversions,
but using the patch below does indeed fix the inefficient code.

I'll still need to do proper testing and benchmarking, but I thought
I'd post the patch anyway just as a heads-up ...

Does this look reasonable to you?

Thanks,
Ulrich


ChangeLog:

* tree-scalar-evolution.c (compute_overall_effect_of_inner_loop):
Perform chrec_apply computation in an unsigned type.

Index: gcc/tree-scalar-evolution.c
===
--- gcc/tree-scalar-evolution.c (revision 184439)
+++ gcc/tree-scalar-evolution.c (working copy)
@@ -474,11 +474,18 @@
return chrec_dont_know;
  else
{
+ tree type = TREE_TYPE (evolution_fn);
+ tree utype = unsigned_type_for (type);
  tree res;
 
  /* evolution_fn is the evolution function in LOOP.  Get
-its value in the nb_iter-th iteration.  */
- res = chrec_apply (inner_loop->num, evolution_fn, nb_iter);
+its value in the nb_iter-th iteration.  Since the
+number of iterations is always unsigned, we perform
+the computation in an unsigned type to allow for
+improved simplification of subexpressions.  */
+ res = chrec_convert (utype, evolution_fn, NULL);
+ res = chrec_apply (inner_loop->num, res, nb_iter);
+ res = chrec_convert (type, res, NULL);
 
  if (chrec_contains_symbols_defined_in_loop (res, loop->num))
res = instantiate_parameters (loop, res);

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com



Re: A question about redundant PHI expression stmt

2012-02-28 Thread Richard Guenther
On Tue, Feb 28, 2012 at 9:33 AM, Jiangning Liu  wrote:
>
>
>> -Original Message-
>> From: Jiangning Liu
>> Sent: Tuesday, February 28, 2012 4:07 PM
>> To: Jiangning Liu; 'William J. Schmidt'
>> Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org
>> Subject: RE: A question about redundant PHI expression stmt
>>
>>
>>
>> > -Original Message-
>> > From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf
>> Of
>> > Jiangning Liu
>> > Sent: Tuesday, February 28, 2012 11:19 AM
>> > To: 'William J. Schmidt'
>> > Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org
>> > Subject: RE: A question about redundant PHI expression stmt
>> >
>> >
>> >
>> > > -Original Message-
>> > > From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On
>> Behalf
>> > Of
>> > > William J. Schmidt
>> > > Sent: Monday, February 27, 2012 11:32 PM
>> > > To: Jiangning Liu
>> > > Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org
>> > > Subject: Re: A question about redundant PHI expression stmt
>> > >
>> > > On Fri, 2012-02-24 at 16:07 +0800, Jiangning Liu wrote:
>> > > > Hi,
>> > > >
>> > > > For the small case below, there are some redundant PHI expression
>> > > stmt
>> > > > generated, and finally cause back-end generates redundant copy
>> > > instructions
>> > > > due to some reasons around IRA.
>> > > >
>> > > > int *l, *r, *g;
>> > > > void test_func(int n)
>> > > > {
>> > > >         int i;
>> > > >         static int j;
>> > > >         static int pos, direction, direction_pre;
>> > > >
>> > > >         pos = 0;
>> > > >         direction = 1;
>> > > >
>> > > >         for ( i = 0; i < n; i++ )
>> > > >         {
>> > > >                 direction_pre = direction;
>> > > >
>> > > >                 for ( j = 0; j <= 400; j++ )
>> > > >                 {
>> > > >
>> > > >                         if ( g[pos] == 200 )
>> > > >                                 break;
>> > > >
>> > > >                         if ( direction == 0 )
>> > > >                                 pos = l[pos];
>> > > >                         else
>> > > >                                 pos = r[pos];
>> > > >
>> > > >                         if ( pos == -1 )
>> > > >                         {
>> > > >                                 if ( direction_pre != direction )
>> > > >                                         break;
>> > > >
>> > > >                                 pos = 0;
>> > > >                                 direction = !direction;
>> > > >                         }
>> > > >
>> > > >                 }
>> > > >
>> > > >                 f(g[pos]);
>> > > >         }
>> > > > }
>> > > >
>> > > > I know PR39976 has something to do with this case, and check-in
>> > > r182140
>> > > > caused big degradation on the real benchmark, but I'm still
>> > confusing
>> > > around
>> > > > this issue.
>> > > >
>> > > > The procedure is like this,
>> > > >
>> > > > Loop store motion generates code below,
>> > > >
>> > > > :
>> > > >   D.4084_17 = l.4_13 + D.4077_70;
>> > > >   pos.5_18 = *D.4084_17;
>> > > >   pos_lsm.34_103 = pos.5_18;
>> > > >   goto ;
>> > > >
>> > > > :
>> > > >   D.4088_23 = r.6_19 + D.4077_70;
>> > > >   pos.7_24 = *D.4088_23;
>> > > >   pos_lsm.34_104 = pos.7_24;
>> > > >
>> > > > :
>> > > >   # prephitmp.29_89 = PHI 
>> > > >   # pos_lsm.34_53 = PHI 
>> > > >   pos.2_25 = prephitmp.29_89;
>> > > >   if (pos.2_25 == -1)
>> > > >     goto ;
>> > > >   else
>> > > >     goto ;
>> > > >
>> > > > And then, copy propagation transforms block 8 it into
>> > > >
>> > > > :
>> > > >   # prephitmp.29_89 = PHI 
>> > > >   # pos_lsm.34_53 = PHI 
>> > > >   ...
>> > > >
>> > > > And then, these two duplicated PHI stmts have been kept until
>> > expand
>> > > pass.
>> > > >
>> > > > In dom2, a stmt like below
>> > > >
>> > > >   # pos_lsm.34_54 = PHI 
>> > > >
>> > > > is transformed into
>> > > >
>> > > >   # pos_lsm.34_54 = PHI 
>> > > >
>> > > > just because they are equal.
>> > > >
>> > > > Unfortunately, this transformation changed back-end behavior to
>> > > generate
>> > > > redundant copy instructions and hurt performance. This case is
>> from
>> > a
>> > > real
>> > > > benchmark and hurt performance a lot.
>> > > >
>> > > > Does the fix in r182140 intend to totally clean up this kind of
>> > > redundancy?
>> > > > Where should we get it fixed?
>> > > >
>> > >
>> > > Hi, sorry not to have responded sooner -- I just now got some time
>> to
>> > > look at this.
>> > >
>> > > I compiled your code with -O3 for revisions 182139 and 182140 of
>> GCC
>> > > trunk, and found no difference between the code produced by the
>> > middle
>> > > end for the two versions.  So something else has apparently come
>> > along
>> > > since then that helped produce the problematic code generation for
>> > you.
>> > > Either that or I need to use different compile flags; you didn't
>> > > specify
>> > > what you used.
>> > >
>> > > The fix in r182140 does just what you saw in dom2:  identifies
>> > > duplicate
>> > > PHIs in the same block and ensures only one

RE: A question about redundant PHI expression stmt

2012-02-28 Thread Jiangning Liu


> -Original Message-
> From: Jiangning Liu
> Sent: Tuesday, February 28, 2012 4:07 PM
> To: Jiangning Liu; 'William J. Schmidt'
> Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org
> Subject: RE: A question about redundant PHI expression stmt
> 
> 
> 
> > -Original Message-
> > From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf
> Of
> > Jiangning Liu
> > Sent: Tuesday, February 28, 2012 11:19 AM
> > To: 'William J. Schmidt'
> > Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org
> > Subject: RE: A question about redundant PHI expression stmt
> >
> >
> >
> > > -Original Message-
> > > From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On
> Behalf
> > Of
> > > William J. Schmidt
> > > Sent: Monday, February 27, 2012 11:32 PM
> > > To: Jiangning Liu
> > > Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org
> > > Subject: Re: A question about redundant PHI expression stmt
> > >
> > > On Fri, 2012-02-24 at 16:07 +0800, Jiangning Liu wrote:
> > > > Hi,
> > > >
> > > > For the small case below, there are some redundant PHI expression
> > > stmt
> > > > generated, and finally cause back-end generates redundant copy
> > > instructions
> > > > due to some reasons around IRA.
> > > >
> > > > int *l, *r, *g;
> > > > void test_func(int n)
> > > > {
> > > > int i;
> > > > static int j;
> > > > static int pos, direction, direction_pre;
> > > >
> > > > pos = 0;
> > > > direction = 1;
> > > >
> > > > for ( i = 0; i < n; i++ )
> > > > {
> > > > direction_pre = direction;
> > > >
> > > > for ( j = 0; j <= 400; j++ )
> > > > {
> > > >
> > > > if ( g[pos] == 200 )
> > > > break;
> > > >
> > > > if ( direction == 0 )
> > > > pos = l[pos];
> > > > else
> > > > pos = r[pos];
> > > >
> > > > if ( pos == -1 )
> > > > {
> > > > if ( direction_pre != direction )
> > > > break;
> > > >
> > > > pos = 0;
> > > > direction = !direction;
> > > > }
> > > >
> > > > }
> > > >
> > > > f(g[pos]);
> > > > }
> > > > }
> > > >
> > > > I know PR39976 has something to do with this case, and check-in
> > > r182140
> > > > caused big degradation on the real benchmark, but I'm still
> > confusing
> > > around
> > > > this issue.
> > > >
> > > > The procedure is like this,
> > > >
> > > > Loop store motion generates code below,
> > > >
> > > > :
> > > >   D.4084_17 = l.4_13 + D.4077_70;
> > > >   pos.5_18 = *D.4084_17;
> > > >   pos_lsm.34_103 = pos.5_18;
> > > >   goto ;
> > > >
> > > > :
> > > >   D.4088_23 = r.6_19 + D.4077_70;
> > > >   pos.7_24 = *D.4088_23;
> > > >   pos_lsm.34_104 = pos.7_24;
> > > >
> > > > :
> > > >   # prephitmp.29_89 = PHI 
> > > >   # pos_lsm.34_53 = PHI 
> > > >   pos.2_25 = prephitmp.29_89;
> > > >   if (pos.2_25 == -1)
> > > > goto ;
> > > >   else
> > > > goto ;
> > > >
> > > > And then, copy propagation transforms block 8 it into
> > > >
> > > > :
> > > >   # prephitmp.29_89 = PHI 
> > > >   # pos_lsm.34_53 = PHI 
> > > >   ...
> > > >
> > > > And then, these two duplicated PHI stmts have been kept until
> > expand
> > > pass.
> > > >
> > > > In dom2, a stmt like below
> > > >
> > > >   # pos_lsm.34_54 = PHI 
> > > >
> > > > is transformed into
> > > >
> > > >   # pos_lsm.34_54 = PHI 
> > > >
> > > > just because they are equal.
> > > >
> > > > Unfortunately, this transformation changed back-end behavior to
> > > generate
> > > > redundant copy instructions and hurt performance. This case is
> from
> > a
> > > real
> > > > benchmark and hurt performance a lot.
> > > >
> > > > Does the fix in r182140 intend to totally clean up this kind of
> > > redundancy?
> > > > Where should we get it fixed?
> > > >
> > >
> > > Hi, sorry not to have responded sooner -- I just now got some time
> to
> > > look at this.
> > >
> > > I compiled your code with -O3 for revisions 182139 and 182140 of
> GCC
> > > trunk, and found no difference between the code produced by the
> > middle
> > > end for the two versions.  So something else has apparently come
> > along
> > > since then that helped produce the problematic code generation for
> > you.
> > > Either that or I need to use different compile flags; you didn't
> > > specify
> > > what you used.
> > >
> > > The fix in r182140 does just what you saw in dom2:  identifies
> > > duplicate
> > > PHIs in the same block and ensures only one of them is used.  This
> > > actually avoids inserting extra blocks during expand in certain
> loop
> > > cases.  I am not sure why you are getting redundant copies as a
> > result,
> > > but it sounds from your comm

RE: A question about redundant PHI expression stmt

2012-02-28 Thread Jiangning Liu


> -Original Message-
> From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of
> Jiangning Liu
> Sent: Tuesday, February 28, 2012 11:19 AM
> To: 'William J. Schmidt'
> Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org
> Subject: RE: A question about redundant PHI expression stmt
> 
> 
> 
> > -Original Message-
> > From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf
> Of
> > William J. Schmidt
> > Sent: Monday, February 27, 2012 11:32 PM
> > To: Jiangning Liu
> > Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org
> > Subject: Re: A question about redundant PHI expression stmt
> >
> > On Fri, 2012-02-24 at 16:07 +0800, Jiangning Liu wrote:
> > > Hi,
> > >
> > > For the small case below, there are some redundant PHI expression
> > stmt
> > > generated, and finally cause back-end generates redundant copy
> > instructions
> > > due to some reasons around IRA.
> > >
> > > int *l, *r, *g;
> > > void test_func(int n)
> > > {
> > >   int i;
> > >   static int j;
> > >   static int pos, direction, direction_pre;
> > >
> > >   pos = 0;
> > >   direction = 1;
> > >
> > >   for ( i = 0; i < n; i++ )
> > >   {
> > >   direction_pre = direction;
> > >
> > >   for ( j = 0; j <= 400; j++ )
> > >   {
> > >
> > >   if ( g[pos] == 200 )
> > >   break;
> > >
> > >   if ( direction == 0 )
> > >   pos = l[pos];
> > >   else
> > >   pos = r[pos];
> > >
> > >   if ( pos == -1 )
> > >   {
> > >   if ( direction_pre != direction )
> > >   break;
> > >
> > >   pos = 0;
> > >   direction = !direction;
> > >   }
> > >
> > >   }
> > >
> > >   f(g[pos]);
> > >   }
> > > }
> > >
> > > I know PR39976 has something to do with this case, and check-in
> > r182140
> > > caused big degradation on the real benchmark, but I'm still
> confusing
> > around
> > > this issue.
> > >
> > > The procedure is like this,
> > >
> > > Loop store motion generates code below,
> > >
> > > :
> > >   D.4084_17 = l.4_13 + D.4077_70;
> > >   pos.5_18 = *D.4084_17;
> > >   pos_lsm.34_103 = pos.5_18;
> > >   goto ;
> > >
> > > :
> > >   D.4088_23 = r.6_19 + D.4077_70;
> > >   pos.7_24 = *D.4088_23;
> > >   pos_lsm.34_104 = pos.7_24;
> > >
> > > :
> > >   # prephitmp.29_89 = PHI 
> > >   # pos_lsm.34_53 = PHI 
> > >   pos.2_25 = prephitmp.29_89;
> > >   if (pos.2_25 == -1)
> > > goto ;
> > >   else
> > > goto ;
> > >
> > > And then, copy propagation transforms block 8 it into
> > >
> > > :
> > >   # prephitmp.29_89 = PHI 
> > >   # pos_lsm.34_53 = PHI 
> > >   ...
> > >
> > > And then, these two duplicated PHI stmts have been kept until
> expand
> > pass.
> > >
> > > In dom2, a stmt like below
> > >
> > >   # pos_lsm.34_54 = PHI 
> > >
> > > is transformed into
> > >
> > >   # pos_lsm.34_54 = PHI 
> > >
> > > just because they are equal.
> > >
> > > Unfortunately, this transformation changed back-end behavior to
> > generate
> > > redundant copy instructions and hurt performance. This case is from
> a
> > real
> > > benchmark and hurt performance a lot.
> > >
> > > Does the fix in r182140 intend to totally clean up this kind of
> > redundancy?
> > > Where should we get it fixed?
> > >
> >
> > Hi, sorry not to have responded sooner -- I just now got some time to
> > look at this.
> >
> > I compiled your code with -O3 for revisions 182139 and 182140 of GCC
> > trunk, and found no difference between the code produced by the
> middle
> > end for the two versions.  So something else has apparently come
> along
> > since then that helped produce the problematic code generation for
> you.
> > Either that or I need to use different compile flags; you didn't
> > specify
> > what you used.
> >
> > The fix in r182140 does just what you saw in dom2:  identifies
> > duplicate
> > PHIs in the same block and ensures only one of them is used.  This
> > actually avoids inserting extra blocks during expand in certain loop
> > cases.  I am not sure why you are getting redundant copies as a
> result,
> > but it sounds from your comments like IRA didn't coalesce a register
> > copy or something like that.  You may want to bisect revisions on the
> > trunk to see where your bad code generation started to occur to get a
> > better handle on what happened.
> >
> > As Richard said, the dom pass is likely to be removed someday,
> whenever
> > someone can get around to it.  My redundant-phi band-aid in dom would
> > go
> > away then as well, but presumably an extra pass of PRE would replace
> it,
> > and redundant PHIs would still be removed.
> >
> 
> Bill,
> 
> Thanks a lot for your explanation!
> 
> The bug could be exposed with -O2 if you apply the check-in r184435
> made by Richard.
> 
> BTW, at present is there anybody working on the NEW pass