RE: A question about redundant PHI expression stmt

2012-02-28 Thread Jiangning Liu


 -Original Message-
 From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of
 Jiangning Liu
 Sent: Tuesday, February 28, 2012 11:19 AM
 To: 'William J. Schmidt'
 Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org
 Subject: RE: A question about redundant PHI expression stmt
 
 
 
  -Original Message-
  From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf
 Of
  William J. Schmidt
  Sent: Monday, February 27, 2012 11:32 PM
  To: Jiangning Liu
  Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org
  Subject: Re: A question about redundant PHI expression stmt
 
  On Fri, 2012-02-24 at 16:07 +0800, Jiangning Liu wrote:
   Hi,
  
   For the small case below, there are some redundant PHI expression
  stmt
   generated, and finally cause back-end generates redundant copy
  instructions
   due to some reasons around IRA.
  
   int *l, *r, *g;
   void test_func(int n)
   {
 int i;
 static int j;
 static int pos, direction, direction_pre;
  
 pos = 0;
 direction = 1;
  
 for ( i = 0; i  n; i++ )
 {
 direction_pre = direction;
  
 for ( j = 0; j = 400; j++ )
 {
  
 if ( g[pos] == 200 )
 break;
  
 if ( direction == 0 )
 pos = l[pos];
 else
 pos = r[pos];
  
 if ( pos == -1 )
 {
 if ( direction_pre != direction )
 break;
  
 pos = 0;
 direction = !direction;
 }
  
 }
  
 f(g[pos]);
 }
   }
  
   I know PR39976 has something to do with this case, and check-in
  r182140
   caused big degradation on the real benchmark, but I'm still
 confusing
  around
   this issue.
  
   The procedure is like this,
  
   Loop store motion generates code below,
  
   bb 6:
 D.4084_17 = l.4_13 + D.4077_70;
 pos.5_18 = *D.4084_17;
 pos_lsm.34_103 = pos.5_18;
 goto bb 8;
  
   bb 7:
 D.4088_23 = r.6_19 + D.4077_70;
 pos.7_24 = *D.4088_23;
 pos_lsm.34_104 = pos.7_24;
  
   bb 8:
 # prephitmp.29_89 = PHI pos.5_18(6), pos.7_24(7)
 # pos_lsm.34_53 = PHI pos_lsm.34_103(6), pos_lsm.34_104(7)
 pos.2_25 = prephitmp.29_89;
 if (pos.2_25 == -1)
   goto bb 9;
 else
   goto bb 20;
  
   And then, copy propagation transforms block 8 it into
  
   bb 8:
 # prephitmp.29_89 = PHI pos.5_18(11), pos.7_24(12)
 # pos_lsm.34_53 = PHI pos.5_18(11), pos.7_24(12)
 ...
  
   And then, these two duplicated PHI stmts have been kept until
 expand
  pass.
  
   In dom2, a stmt like below
  
 # pos_lsm.34_54 = PHI pos_lsm.34_53(13), 0(16)
  
   is transformed into
  
 # pos_lsm.34_54 = PHI prephitmp.29_89(13), 0(16)
  
   just because they are equal.
  
   Unfortunately, this transformation changed back-end behavior to
  generate
   redundant copy instructions and hurt performance. This case is from
 a
  real
   benchmark and hurt performance a lot.
  
   Does the fix in r182140 intend to totally clean up this kind of
  redundancy?
   Where should we get it fixed?
  
 
  Hi, sorry not to have responded sooner -- I just now got some time to
  look at this.
 
  I compiled your code with -O3 for revisions 182139 and 182140 of GCC
  trunk, and found no difference between the code produced by the
 middle
  end for the two versions.  So something else has apparently come
 along
  since then that helped produce the problematic code generation for
 you.
  Either that or I need to use different compile flags; you didn't
  specify
  what you used.
 
  The fix in r182140 does just what you saw in dom2:  identifies
  duplicate
  PHIs in the same block and ensures only one of them is used.  This
  actually avoids inserting extra blocks during expand in certain loop
  cases.  I am not sure why you are getting redundant copies as a
 result,
  but it sounds from your comments like IRA didn't coalesce a register
  copy or something like that.  You may want to bisect revisions on the
  trunk to see where your bad code generation started to occur to get a
  better handle on what happened.
 
  As Richard said, the dom pass is likely to be removed someday,
 whenever
  someone can get around to it.  My redundant-phi band-aid in dom would
  go
  away then as well, but presumably an extra pass of PRE would replace
 it,
  and redundant PHIs would still be removed.
 
 
 Bill,
 
 Thanks a lot for your explanation!
 
 The bug could be exposed with -O2 if you apply the check-in r184435
 made by Richard.
 
 BTW, at present is there anybody working on the NEW pass replacing dom?
 If no, in short term, it would be good to still get it fixed just
 because it is hurting benchmark. If Yes, may I know what that NEW pass
 will be focusing on?
 

With dump enabled, I do see a PHI copy

RE: A question about redundant PHI expression stmt

2012-02-28 Thread Jiangning Liu


 -Original Message-
 From: Jiangning Liu
 Sent: Tuesday, February 28, 2012 4:07 PM
 To: Jiangning Liu; 'William J. Schmidt'
 Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org
 Subject: RE: A question about redundant PHI expression stmt
 
 
 
  -Original Message-
  From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf
 Of
  Jiangning Liu
  Sent: Tuesday, February 28, 2012 11:19 AM
  To: 'William J. Schmidt'
  Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org
  Subject: RE: A question about redundant PHI expression stmt
 
 
 
   -Original Message-
   From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On
 Behalf
  Of
   William J. Schmidt
   Sent: Monday, February 27, 2012 11:32 PM
   To: Jiangning Liu
   Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org
   Subject: Re: A question about redundant PHI expression stmt
  
   On Fri, 2012-02-24 at 16:07 +0800, Jiangning Liu wrote:
Hi,
   
For the small case below, there are some redundant PHI expression
   stmt
generated, and finally cause back-end generates redundant copy
   instructions
due to some reasons around IRA.
   
int *l, *r, *g;
void test_func(int n)
{
int i;
static int j;
static int pos, direction, direction_pre;
   
pos = 0;
direction = 1;
   
for ( i = 0; i  n; i++ )
{
direction_pre = direction;
   
for ( j = 0; j = 400; j++ )
{
   
if ( g[pos] == 200 )
break;
   
if ( direction == 0 )
pos = l[pos];
else
pos = r[pos];
   
if ( pos == -1 )
{
if ( direction_pre != direction )
break;
   
pos = 0;
direction = !direction;
}
   
}
   
f(g[pos]);
}
}
   
I know PR39976 has something to do with this case, and check-in
   r182140
caused big degradation on the real benchmark, but I'm still
  confusing
   around
this issue.
   
The procedure is like this,
   
Loop store motion generates code below,
   
bb 6:
  D.4084_17 = l.4_13 + D.4077_70;
  pos.5_18 = *D.4084_17;
  pos_lsm.34_103 = pos.5_18;
  goto bb 8;
   
bb 7:
  D.4088_23 = r.6_19 + D.4077_70;
  pos.7_24 = *D.4088_23;
  pos_lsm.34_104 = pos.7_24;
   
bb 8:
  # prephitmp.29_89 = PHI pos.5_18(6), pos.7_24(7)
  # pos_lsm.34_53 = PHI pos_lsm.34_103(6), pos_lsm.34_104(7)
  pos.2_25 = prephitmp.29_89;
  if (pos.2_25 == -1)
goto bb 9;
  else
goto bb 20;
   
And then, copy propagation transforms block 8 it into
   
bb 8:
  # prephitmp.29_89 = PHI pos.5_18(11), pos.7_24(12)
  # pos_lsm.34_53 = PHI pos.5_18(11), pos.7_24(12)
  ...
   
And then, these two duplicated PHI stmts have been kept until
  expand
   pass.
   
In dom2, a stmt like below
   
  # pos_lsm.34_54 = PHI pos_lsm.34_53(13), 0(16)
   
is transformed into
   
  # pos_lsm.34_54 = PHI prephitmp.29_89(13), 0(16)
   
just because they are equal.
   
Unfortunately, this transformation changed back-end behavior to
   generate
redundant copy instructions and hurt performance. This case is
 from
  a
   real
benchmark and hurt performance a lot.
   
Does the fix in r182140 intend to totally clean up this kind of
   redundancy?
Where should we get it fixed?
   
  
   Hi, sorry not to have responded sooner -- I just now got some time
 to
   look at this.
  
   I compiled your code with -O3 for revisions 182139 and 182140 of
 GCC
   trunk, and found no difference between the code produced by the
  middle
   end for the two versions.  So something else has apparently come
  along
   since then that helped produce the problematic code generation for
  you.
   Either that or I need to use different compile flags; you didn't
   specify
   what you used.
  
   The fix in r182140 does just what you saw in dom2:  identifies
   duplicate
   PHIs in the same block and ensures only one of them is used.  This
   actually avoids inserting extra blocks during expand in certain
 loop
   cases.  I am not sure why you are getting redundant copies as a
  result,
   but it sounds from your comments like IRA didn't coalesce a
 register
   copy or something like that.  You may want to bisect revisions on
 the
   trunk to see where your bad code generation started to occur to get
 a
   better handle on what happened.
  
   As Richard said, the dom pass is likely to be removed someday,
  whenever
   someone can get around to it.  My redundant-phi band-aid

Re: A question about redundant PHI expression stmt

2012-02-28 Thread Richard Guenther
On Tue, Feb 28, 2012 at 9:33 AM, Jiangning Liu jiangning@arm.com wrote:


 -Original Message-
 From: Jiangning Liu
 Sent: Tuesday, February 28, 2012 4:07 PM
 To: Jiangning Liu; 'William J. Schmidt'
 Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org
 Subject: RE: A question about redundant PHI expression stmt



  -Original Message-
  From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf
 Of
  Jiangning Liu
  Sent: Tuesday, February 28, 2012 11:19 AM
  To: 'William J. Schmidt'
  Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org
  Subject: RE: A question about redundant PHI expression stmt
 
 
 
   -Original Message-
   From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On
 Behalf
  Of
   William J. Schmidt
   Sent: Monday, February 27, 2012 11:32 PM
   To: Jiangning Liu
   Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org
   Subject: Re: A question about redundant PHI expression stmt
  
   On Fri, 2012-02-24 at 16:07 +0800, Jiangning Liu wrote:
Hi,
   
For the small case below, there are some redundant PHI expression
   stmt
generated, and finally cause back-end generates redundant copy
   instructions
due to some reasons around IRA.
   
int *l, *r, *g;
void test_func(int n)
{
        int i;
        static int j;
        static int pos, direction, direction_pre;
   
        pos = 0;
        direction = 1;
   
        for ( i = 0; i  n; i++ )
        {
                direction_pre = direction;
   
                for ( j = 0; j = 400; j++ )
                {
   
                        if ( g[pos] == 200 )
                                break;
   
                        if ( direction == 0 )
                                pos = l[pos];
                        else
                                pos = r[pos];
   
                        if ( pos == -1 )
                        {
                                if ( direction_pre != direction )
                                        break;
   
                                pos = 0;
                                direction = !direction;
                        }
   
                }
   
                f(g[pos]);
        }
}
   
I know PR39976 has something to do with this case, and check-in
   r182140
caused big degradation on the real benchmark, but I'm still
  confusing
   around
this issue.
   
The procedure is like this,
   
Loop store motion generates code below,
   
bb 6:
  D.4084_17 = l.4_13 + D.4077_70;
  pos.5_18 = *D.4084_17;
  pos_lsm.34_103 = pos.5_18;
  goto bb 8;
   
bb 7:
  D.4088_23 = r.6_19 + D.4077_70;
  pos.7_24 = *D.4088_23;
  pos_lsm.34_104 = pos.7_24;
   
bb 8:
  # prephitmp.29_89 = PHI pos.5_18(6), pos.7_24(7)
  # pos_lsm.34_53 = PHI pos_lsm.34_103(6), pos_lsm.34_104(7)
  pos.2_25 = prephitmp.29_89;
  if (pos.2_25 == -1)
    goto bb 9;
  else
    goto bb 20;
   
And then, copy propagation transforms block 8 it into
   
bb 8:
  # prephitmp.29_89 = PHI pos.5_18(11), pos.7_24(12)
  # pos_lsm.34_53 = PHI pos.5_18(11), pos.7_24(12)
  ...
   
And then, these two duplicated PHI stmts have been kept until
  expand
   pass.
   
In dom2, a stmt like below
   
  # pos_lsm.34_54 = PHI pos_lsm.34_53(13), 0(16)
   
is transformed into
   
  # pos_lsm.34_54 = PHI prephitmp.29_89(13), 0(16)
   
just because they are equal.
   
Unfortunately, this transformation changed back-end behavior to
   generate
redundant copy instructions and hurt performance. This case is
 from
  a
   real
benchmark and hurt performance a lot.
   
Does the fix in r182140 intend to totally clean up this kind of
   redundancy?
Where should we get it fixed?
   
  
   Hi, sorry not to have responded sooner -- I just now got some time
 to
   look at this.
  
   I compiled your code with -O3 for revisions 182139 and 182140 of
 GCC
   trunk, and found no difference between the code produced by the
  middle
   end for the two versions.  So something else has apparently come
  along
   since then that helped produce the problematic code generation for
  you.
   Either that or I need to use different compile flags; you didn't
   specify
   what you used.
  
   The fix in r182140 does just what you saw in dom2:  identifies
   duplicate
   PHIs in the same block and ensures only one of them is used.  This
   actually avoids inserting extra blocks during expand in certain
 loop
   cases.  I am not sure why you are getting redundant copies as a
  result,
   but it sounds from your comments like IRA didn't coalesce a
 register
   copy or something like that.  You may want to bisect revisions on
 the
   trunk to see where your bad code generation started to occur to get
 a
   better handle on what happened.
  
   As Richard said, the dom pass is likely to be removed someday

Re: A question about redundant PHI expression stmt

2012-02-28 Thread William J. Schmidt
On Tue, 2012-02-28 at 11:21 +0100, Richard Guenther wrote:
 On Tue, Feb 28, 2012 at 9:33 AM, Jiangning Liu jiangning@arm.com wrote:
 
 
  -Original Message-
  From: Jiangning Liu
  Sent: Tuesday, February 28, 2012 4:07 PM
  To: Jiangning Liu; 'William J. Schmidt'
  Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org
  Subject: RE: A question about redundant PHI expression stmt
 
 
 
   -Original Message-
   From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf
  Of
   Jiangning Liu
   Sent: Tuesday, February 28, 2012 11:19 AM
   To: 'William J. Schmidt'
   Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org
   Subject: RE: A question about redundant PHI expression stmt
  
  
  
-Original Message-
From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On
  Behalf
   Of
William J. Schmidt
Sent: Monday, February 27, 2012 11:32 PM
To: Jiangning Liu
Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org
Subject: Re: A question about redundant PHI expression stmt
   
On Fri, 2012-02-24 at 16:07 +0800, Jiangning Liu wrote:
 Hi,

 For the small case below, there are some redundant PHI expression
stmt
 generated, and finally cause back-end generates redundant copy
instructions
 due to some reasons around IRA.

 int *l, *r, *g;
 void test_func(int n)
 {
 int i;
 static int j;
 static int pos, direction, direction_pre;

 pos = 0;
 direction = 1;

 for ( i = 0; i  n; i++ )
 {
 direction_pre = direction;

 for ( j = 0; j = 400; j++ )
 {

 if ( g[pos] == 200 )
 break;

 if ( direction == 0 )
 pos = l[pos];
 else
 pos = r[pos];

 if ( pos == -1 )
 {
 if ( direction_pre != direction )
 break;

 pos = 0;
 direction = !direction;
 }

 }

 f(g[pos]);
 }
 }

 I know PR39976 has something to do with this case, and check-in
r182140
 caused big degradation on the real benchmark, but I'm still
   confusing
around
 this issue.

 The procedure is like this,

 Loop store motion generates code below,

 bb 6:
   D.4084_17 = l.4_13 + D.4077_70;
   pos.5_18 = *D.4084_17;
   pos_lsm.34_103 = pos.5_18;
   goto bb 8;

 bb 7:
   D.4088_23 = r.6_19 + D.4077_70;
   pos.7_24 = *D.4088_23;
   pos_lsm.34_104 = pos.7_24;

 bb 8:
   # prephitmp.29_89 = PHI pos.5_18(6), pos.7_24(7)
   # pos_lsm.34_53 = PHI pos_lsm.34_103(6), pos_lsm.34_104(7)
   pos.2_25 = prephitmp.29_89;
   if (pos.2_25 == -1)
 goto bb 9;
   else
 goto bb 20;

 And then, copy propagation transforms block 8 it into

 bb 8:
   # prephitmp.29_89 = PHI pos.5_18(11), pos.7_24(12)
   # pos_lsm.34_53 = PHI pos.5_18(11), pos.7_24(12)
   ...

 And then, these two duplicated PHI stmts have been kept until
   expand
pass.

 In dom2, a stmt like below

   # pos_lsm.34_54 = PHI pos_lsm.34_53(13), 0(16)

 is transformed into

   # pos_lsm.34_54 = PHI prephitmp.29_89(13), 0(16)

 just because they are equal.

 Unfortunately, this transformation changed back-end behavior to
generate
 redundant copy instructions and hurt performance. This case is
  from
   a
real
 benchmark and hurt performance a lot.

 Does the fix in r182140 intend to totally clean up this kind of
redundancy?
 Where should we get it fixed?

   
Hi, sorry not to have responded sooner -- I just now got some time
  to
look at this.
   
I compiled your code with -O3 for revisions 182139 and 182140 of
  GCC
trunk, and found no difference between the code produced by the
   middle
end for the two versions.  So something else has apparently come
   along
since then that helped produce the problematic code generation for
   you.
Either that or I need to use different compile flags; you didn't
specify
what you used.
   
The fix in r182140 does just what you saw in dom2:  identifies
duplicate
PHIs in the same block and ensures only one of them is used.  This
actually avoids inserting extra blocks during expand in certain
  loop
cases.  I am not sure why you are getting redundant copies as a
   result,
but it sounds from your comments like IRA didn't coalesce a
  register
copy

Re: A question about redundant PHI expression stmt

2012-02-28 Thread William J. Schmidt
On Tue, 2012-02-28 at 11:03 -0600, William J. Schmidt wrote:

 I think this is probably a problem with how cprop_into_successor_phis
 works.  It only propagates into immediate successors of a block.  In
 this case copies are propagated from bb12 into phis in bb13 and bb14 (of
 which there are none).  Seems like it could propagate into phis of all
 dominator children, which would catch bb15.  I haven't looked at this
 part of the code in too much detail, but I'm guessing since bb15's
 immediate dominator is bb12, this is the last chance to get the value
 propagated into the phi.
 
 When I get more time, I'll walk through the logic for this test to
 confirm, and post here with the results.
 
 If this is the problem, then I doubt anything will be fixed for this in
 the limited time left for 4.7.  You'd want to open a missed-optimization
 PR for it.

No, this isn't right.  I see in the dump that the copy is being removed
from the const_and_copies_table after processing block 13 and before
processing block 14.  That's presumably a bug.  I'll keep looking.

Bill



Re: A question about redundant PHI expression stmt

2012-02-28 Thread William J. Schmidt


On Tue, 2012-02-28 at 11:52 -0600, William J. Schmidt wrote:
 On Tue, 2012-02-28 at 11:03 -0600, William J. Schmidt wrote:
 
  I think this is probably a problem with how cprop_into_successor_phis
  works.  It only propagates into immediate successors of a block.  In
  this case copies are propagated from bb12 into phis in bb13 and bb14 (of
  which there are none).  Seems like it could propagate into phis of all
  dominator children, which would catch bb15.  I haven't looked at this
  part of the code in too much detail, but I'm guessing since bb15's
  immediate dominator is bb12, this is the last chance to get the value
  propagated into the phi.
  
  When I get more time, I'll walk through the logic for this test to
  confirm, and post here with the results.
  
  If this is the problem, then I doubt anything will be fixed for this in
  the limited time left for 4.7.  You'd want to open a missed-optimization
  PR for it.
 
 No, this isn't right.  I see in the dump that the copy is being removed
 from the const_and_copies_table after processing block 13 and before
 processing block 14.  That's presumably a bug.  I'll keep looking.
 

OK, I see the problem.  We have a long-standing bug in dom's use of edge
threading, which quietly corrupts the const_and_copy stack.  I'll create
a bug report and post a patch for you to try out.  My testing shows that
the redundant phi is properly handled now, but let's see if it solves
your performance issue.

Thanks,
Bill

 Bill



Re: A question about redundant PHI expression stmt

2012-02-27 Thread William J. Schmidt
On Fri, 2012-02-24 at 16:07 +0800, Jiangning Liu wrote:
 Hi,
 
 For the small case below, there are some redundant PHI expression stmt
 generated, and finally cause back-end generates redundant copy instructions
 due to some reasons around IRA.
 
 int *l, *r, *g;
 void test_func(int n)
 {
   int i;
   static int j;
   static int pos, direction, direction_pre;
 
   pos = 0;
   direction = 1;
 
   for ( i = 0; i  n; i++ )
   {
   direction_pre = direction;
 
   for ( j = 0; j = 400; j++ )
   {
 
   if ( g[pos] == 200 )
   break;
 
   if ( direction == 0 )
   pos = l[pos];
   else
   pos = r[pos];
 
   if ( pos == -1 )
   {
   if ( direction_pre != direction )
   break;
 
   pos = 0;
   direction = !direction;
   }
 
   }
 
   f(g[pos]);
   }
 }
 
 I know PR39976 has something to do with this case, and check-in r182140
 caused big degradation on the real benchmark, but I'm still confusing around
 this issue.
 
 The procedure is like this,
 
 Loop store motion generates code below,
 
 bb 6:
   D.4084_17 = l.4_13 + D.4077_70;
   pos.5_18 = *D.4084_17;
   pos_lsm.34_103 = pos.5_18;
   goto bb 8;
 
 bb 7:
   D.4088_23 = r.6_19 + D.4077_70;
   pos.7_24 = *D.4088_23;
   pos_lsm.34_104 = pos.7_24;
 
 bb 8:
   # prephitmp.29_89 = PHI pos.5_18(6), pos.7_24(7)
   # pos_lsm.34_53 = PHI pos_lsm.34_103(6), pos_lsm.34_104(7)
   pos.2_25 = prephitmp.29_89;
   if (pos.2_25 == -1)
 goto bb 9;
   else
 goto bb 20;
 
 And then, copy propagation transforms block 8 it into 
 
 bb 8:
   # prephitmp.29_89 = PHI pos.5_18(11), pos.7_24(12)
   # pos_lsm.34_53 = PHI pos.5_18(11), pos.7_24(12)
   ...
 
 And then, these two duplicated PHI stmts have been kept until expand pass. 
 
 In dom2, a stmt like below
 
   # pos_lsm.34_54 = PHI pos_lsm.34_53(13), 0(16)
 
 is transformed into
 
   # pos_lsm.34_54 = PHI prephitmp.29_89(13), 0(16)
 
 just because they are equal.
 
 Unfortunately, this transformation changed back-end behavior to generate
 redundant copy instructions and hurt performance. This case is from a real
 benchmark and hurt performance a lot.
 
 Does the fix in r182140 intend to totally clean up this kind of redundancy?
 Where should we get it fixed?
 

Hi, sorry not to have responded sooner -- I just now got some time to
look at this.

I compiled your code with -O3 for revisions 182139 and 182140 of GCC
trunk, and found no difference between the code produced by the middle
end for the two versions.  So something else has apparently come along
since then that helped produce the problematic code generation for you.
Either that or I need to use different compile flags; you didn't specify
what you used.

The fix in r182140 does just what you saw in dom2:  identifies duplicate
PHIs in the same block and ensures only one of them is used.  This
actually avoids inserting extra blocks during expand in certain loop
cases.  I am not sure why you are getting redundant copies as a result,
but it sounds from your comments like IRA didn't coalesce a register
copy or something like that.  You may want to bisect revisions on the
trunk to see where your bad code generation started to occur to get a
better handle on what happened.

As Richard said, the dom pass is likely to be removed someday, whenever
someone can get around to it.  My redundant-phi band-aid in dom would go
away then as well, but presumably an extra pass of PRE would replace it,
and redundant PHIs would still be removed.

Thanks,
Bill

 Thanks,
 -Jiangning
 
 
 



RE: A question about redundant PHI expression stmt

2012-02-27 Thread Jiangning Liu


 -Original Message-
 From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of
 William J. Schmidt
 Sent: Monday, February 27, 2012 11:32 PM
 To: Jiangning Liu
 Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org
 Subject: Re: A question about redundant PHI expression stmt
 
 On Fri, 2012-02-24 at 16:07 +0800, Jiangning Liu wrote:
  Hi,
 
  For the small case below, there are some redundant PHI expression
 stmt
  generated, and finally cause back-end generates redundant copy
 instructions
  due to some reasons around IRA.
 
  int *l, *r, *g;
  void test_func(int n)
  {
  int i;
  static int j;
  static int pos, direction, direction_pre;
 
  pos = 0;
  direction = 1;
 
  for ( i = 0; i  n; i++ )
  {
  direction_pre = direction;
 
  for ( j = 0; j = 400; j++ )
  {
 
  if ( g[pos] == 200 )
  break;
 
  if ( direction == 0 )
  pos = l[pos];
  else
  pos = r[pos];
 
  if ( pos == -1 )
  {
  if ( direction_pre != direction )
  break;
 
  pos = 0;
  direction = !direction;
  }
 
  }
 
  f(g[pos]);
  }
  }
 
  I know PR39976 has something to do with this case, and check-in
 r182140
  caused big degradation on the real benchmark, but I'm still confusing
 around
  this issue.
 
  The procedure is like this,
 
  Loop store motion generates code below,
 
  bb 6:
D.4084_17 = l.4_13 + D.4077_70;
pos.5_18 = *D.4084_17;
pos_lsm.34_103 = pos.5_18;
goto bb 8;
 
  bb 7:
D.4088_23 = r.6_19 + D.4077_70;
pos.7_24 = *D.4088_23;
pos_lsm.34_104 = pos.7_24;
 
  bb 8:
# prephitmp.29_89 = PHI pos.5_18(6), pos.7_24(7)
# pos_lsm.34_53 = PHI pos_lsm.34_103(6), pos_lsm.34_104(7)
pos.2_25 = prephitmp.29_89;
if (pos.2_25 == -1)
  goto bb 9;
else
  goto bb 20;
 
  And then, copy propagation transforms block 8 it into
 
  bb 8:
# prephitmp.29_89 = PHI pos.5_18(11), pos.7_24(12)
# pos_lsm.34_53 = PHI pos.5_18(11), pos.7_24(12)
...
 
  And then, these two duplicated PHI stmts have been kept until expand
 pass.
 
  In dom2, a stmt like below
 
# pos_lsm.34_54 = PHI pos_lsm.34_53(13), 0(16)
 
  is transformed into
 
# pos_lsm.34_54 = PHI prephitmp.29_89(13), 0(16)
 
  just because they are equal.
 
  Unfortunately, this transformation changed back-end behavior to
 generate
  redundant copy instructions and hurt performance. This case is from a
 real
  benchmark and hurt performance a lot.
 
  Does the fix in r182140 intend to totally clean up this kind of
 redundancy?
  Where should we get it fixed?
 
 
 Hi, sorry not to have responded sooner -- I just now got some time to
 look at this.
 
 I compiled your code with -O3 for revisions 182139 and 182140 of GCC
 trunk, and found no difference between the code produced by the middle
 end for the two versions.  So something else has apparently come along
 since then that helped produce the problematic code generation for you.
 Either that or I need to use different compile flags; you didn't
 specify
 what you used.
 
 The fix in r182140 does just what you saw in dom2:  identifies
 duplicate
 PHIs in the same block and ensures only one of them is used.  This
 actually avoids inserting extra blocks during expand in certain loop
 cases.  I am not sure why you are getting redundant copies as a result,
 but it sounds from your comments like IRA didn't coalesce a register
 copy or something like that.  You may want to bisect revisions on the
 trunk to see where your bad code generation started to occur to get a
 better handle on what happened.
 
 As Richard said, the dom pass is likely to be removed someday, whenever
 someone can get around to it.  My redundant-phi band-aid in dom would
 go
 away then as well, but presumably an extra pass of PRE would replace it,
 and redundant PHIs would still be removed.
 

Bill,

Thanks a lot for your explanation!

The bug could be exposed with -O2 if you apply the check-in r184435 made by 
Richard.

BTW, at present is there anybody working on the NEW pass replacing dom? If no, 
in short term, it would be good to still get it fixed just because it is 
hurting benchmark. If Yes, may I know what that NEW pass will be focusing on?

Thanks,
-Jiangning

 Thanks,
 Bill
 
  Thanks,
  -Jiangning
 
 
 
 






A question about redundant PHI expression stmt

2012-02-24 Thread Jiangning Liu
Hi,

For the small case below, there are some redundant PHI expression stmt
generated, and finally cause back-end generates redundant copy instructions
due to some reasons around IRA.

int *l, *r, *g;
void test_func(int n)
{
int i;
static int j;
static int pos, direction, direction_pre;

pos = 0;
direction = 1;

for ( i = 0; i  n; i++ )
{
direction_pre = direction;

for ( j = 0; j = 400; j++ )
{

if ( g[pos] == 200 )
break;

if ( direction == 0 )
pos = l[pos];
else
pos = r[pos];

if ( pos == -1 )
{
if ( direction_pre != direction )
break;

pos = 0;
direction = !direction;
}

}

f(g[pos]);
}
}

I know PR39976 has something to do with this case, and check-in r182140
caused big degradation on the real benchmark, but I'm still confusing around
this issue.

The procedure is like this,

Loop store motion generates code below,

bb 6:
  D.4084_17 = l.4_13 + D.4077_70;
  pos.5_18 = *D.4084_17;
  pos_lsm.34_103 = pos.5_18;
  goto bb 8;

bb 7:
  D.4088_23 = r.6_19 + D.4077_70;
  pos.7_24 = *D.4088_23;
  pos_lsm.34_104 = pos.7_24;

bb 8:
  # prephitmp.29_89 = PHI pos.5_18(6), pos.7_24(7)
  # pos_lsm.34_53 = PHI pos_lsm.34_103(6), pos_lsm.34_104(7)
  pos.2_25 = prephitmp.29_89;
  if (pos.2_25 == -1)
goto bb 9;
  else
goto bb 20;

And then, copy propagation transforms block 8 it into 

bb 8:
  # prephitmp.29_89 = PHI pos.5_18(11), pos.7_24(12)
  # pos_lsm.34_53 = PHI pos.5_18(11), pos.7_24(12)
  ...

And then, these two duplicated PHI stmts have been kept until expand pass. 

In dom2, a stmt like below

  # pos_lsm.34_54 = PHI pos_lsm.34_53(13), 0(16)

is transformed into

  # pos_lsm.34_54 = PHI prephitmp.29_89(13), 0(16)

just because they are equal.

Unfortunately, this transformation changed back-end behavior to generate
redundant copy instructions and hurt performance. This case is from a real
benchmark and hurt performance a lot.

Does the fix in r182140 intend to totally clean up this kind of redundancy?
Where should we get it fixed?

Thanks,
-Jiangning





Re: A question about redundant PHI expression stmt

2012-02-24 Thread Richard Guenther
On Fri, Feb 24, 2012 at 9:07 AM, Jiangning Liu jiangning@arm.com wrote:
 Hi,

 For the small case below, there are some redundant PHI expression stmt
 generated, and finally cause back-end generates redundant copy instructions
 due to some reasons around IRA.

 int *l, *r, *g;
 void test_func(int n)
 {
        int i;
        static int j;
        static int pos, direction, direction_pre;

        pos = 0;
        direction = 1;

        for ( i = 0; i  n; i++ )
        {
                direction_pre = direction;

                for ( j = 0; j = 400; j++ )
                {

                        if ( g[pos] == 200 )
                                break;

                        if ( direction == 0 )
                                pos = l[pos];
                        else
                                pos = r[pos];

                        if ( pos == -1 )
                        {
                                if ( direction_pre != direction )
                                        break;

                                pos = 0;
                                direction = !direction;
                        }

                }

                f(g[pos]);
        }
 }

 I know PR39976 has something to do with this case, and check-in r182140
 caused big degradation on the real benchmark, but I'm still confusing around
 this issue.

 The procedure is like this,

 Loop store motion generates code below,

 bb 6:
  D.4084_17 = l.4_13 + D.4077_70;
  pos.5_18 = *D.4084_17;
  pos_lsm.34_103 = pos.5_18;
  goto bb 8;

 bb 7:
  D.4088_23 = r.6_19 + D.4077_70;
  pos.7_24 = *D.4088_23;
  pos_lsm.34_104 = pos.7_24;

 bb 8:
  # prephitmp.29_89 = PHI pos.5_18(6), pos.7_24(7)
  # pos_lsm.34_53 = PHI pos_lsm.34_103(6), pos_lsm.34_104(7)
  pos.2_25 = prephitmp.29_89;
  if (pos.2_25 == -1)
    goto bb 9;
  else
    goto bb 20;

 And then, copy propagation transforms block 8 it into

 bb 8:
  # prephitmp.29_89 = PHI pos.5_18(11), pos.7_24(12)
  # pos_lsm.34_53 = PHI pos.5_18(11), pos.7_24(12)
  ...

 And then, these two duplicated PHI stmts have been kept until expand pass.

 In dom2, a stmt like below

  # pos_lsm.34_54 = PHI pos_lsm.34_53(13), 0(16)

 is transformed into

  # pos_lsm.34_54 = PHI prephitmp.29_89(13), 0(16)

 just because they are equal.

 Unfortunately, this transformation changed back-end behavior to generate
 redundant copy instructions and hurt performance. This case is from a real
 benchmark and hurt performance a lot.

 Does the fix in r182140 intend to totally clean up this kind of redundancy?
 Where should we get it fixed?

FRE/PRE are currently the only passes that remove redundant PHI nodes.
DOM can be trivially extended to do the same (I _think_ I had a patch for
this somewhere ... but in the end I think DOM should go, one VN is enough).

Richard.

 Thanks,
 -Jiangning