RE: A question about redundant PHI expression stmt
-Original Message- From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Jiangning Liu Sent: Tuesday, February 28, 2012 11:19 AM To: 'William J. Schmidt' Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org Subject: RE: A question about redundant PHI expression stmt -Original Message- From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of William J. Schmidt Sent: Monday, February 27, 2012 11:32 PM To: Jiangning Liu Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org Subject: Re: A question about redundant PHI expression stmt On Fri, 2012-02-24 at 16:07 +0800, Jiangning Liu wrote: Hi, For the small case below, there are some redundant PHI expression stmt generated, and finally cause back-end generates redundant copy instructions due to some reasons around IRA. int *l, *r, *g; void test_func(int n) { int i; static int j; static int pos, direction, direction_pre; pos = 0; direction = 1; for ( i = 0; i n; i++ ) { direction_pre = direction; for ( j = 0; j = 400; j++ ) { if ( g[pos] == 200 ) break; if ( direction == 0 ) pos = l[pos]; else pos = r[pos]; if ( pos == -1 ) { if ( direction_pre != direction ) break; pos = 0; direction = !direction; } } f(g[pos]); } } I know PR39976 has something to do with this case, and check-in r182140 caused big degradation on the real benchmark, but I'm still confusing around this issue. The procedure is like this, Loop store motion generates code below, bb 6: D.4084_17 = l.4_13 + D.4077_70; pos.5_18 = *D.4084_17; pos_lsm.34_103 = pos.5_18; goto bb 8; bb 7: D.4088_23 = r.6_19 + D.4077_70; pos.7_24 = *D.4088_23; pos_lsm.34_104 = pos.7_24; bb 8: # prephitmp.29_89 = PHI pos.5_18(6), pos.7_24(7) # pos_lsm.34_53 = PHI pos_lsm.34_103(6), pos_lsm.34_104(7) pos.2_25 = prephitmp.29_89; if (pos.2_25 == -1) goto bb 9; else goto bb 20; And then, copy propagation transforms block 8 it into bb 8: # prephitmp.29_89 = PHI pos.5_18(11), pos.7_24(12) # pos_lsm.34_53 = PHI pos.5_18(11), pos.7_24(12) ... And then, these two duplicated PHI stmts have been kept until expand pass. In dom2, a stmt like below # pos_lsm.34_54 = PHI pos_lsm.34_53(13), 0(16) is transformed into # pos_lsm.34_54 = PHI prephitmp.29_89(13), 0(16) just because they are equal. Unfortunately, this transformation changed back-end behavior to generate redundant copy instructions and hurt performance. This case is from a real benchmark and hurt performance a lot. Does the fix in r182140 intend to totally clean up this kind of redundancy? Where should we get it fixed? Hi, sorry not to have responded sooner -- I just now got some time to look at this. I compiled your code with -O3 for revisions 182139 and 182140 of GCC trunk, and found no difference between the code produced by the middle end for the two versions. So something else has apparently come along since then that helped produce the problematic code generation for you. Either that or I need to use different compile flags; you didn't specify what you used. The fix in r182140 does just what you saw in dom2: identifies duplicate PHIs in the same block and ensures only one of them is used. This actually avoids inserting extra blocks during expand in certain loop cases. I am not sure why you are getting redundant copies as a result, but it sounds from your comments like IRA didn't coalesce a register copy or something like that. You may want to bisect revisions on the trunk to see where your bad code generation started to occur to get a better handle on what happened. As Richard said, the dom pass is likely to be removed someday, whenever someone can get around to it. My redundant-phi band-aid in dom would go away then as well, but presumably an extra pass of PRE would replace it, and redundant PHIs would still be removed. Bill, Thanks a lot for your explanation! The bug could be exposed with -O2 if you apply the check-in r184435 made by Richard. BTW, at present is there anybody working on the NEW pass replacing dom? If no, in short term, it would be good to still get it fixed just because it is hurting benchmark. If Yes, may I know what that NEW pass will be focusing on? With dump enabled, I do see a PHI copy
RE: A question about redundant PHI expression stmt
-Original Message- From: Jiangning Liu Sent: Tuesday, February 28, 2012 4:07 PM To: Jiangning Liu; 'William J. Schmidt' Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org Subject: RE: A question about redundant PHI expression stmt -Original Message- From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Jiangning Liu Sent: Tuesday, February 28, 2012 11:19 AM To: 'William J. Schmidt' Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org Subject: RE: A question about redundant PHI expression stmt -Original Message- From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of William J. Schmidt Sent: Monday, February 27, 2012 11:32 PM To: Jiangning Liu Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org Subject: Re: A question about redundant PHI expression stmt On Fri, 2012-02-24 at 16:07 +0800, Jiangning Liu wrote: Hi, For the small case below, there are some redundant PHI expression stmt generated, and finally cause back-end generates redundant copy instructions due to some reasons around IRA. int *l, *r, *g; void test_func(int n) { int i; static int j; static int pos, direction, direction_pre; pos = 0; direction = 1; for ( i = 0; i n; i++ ) { direction_pre = direction; for ( j = 0; j = 400; j++ ) { if ( g[pos] == 200 ) break; if ( direction == 0 ) pos = l[pos]; else pos = r[pos]; if ( pos == -1 ) { if ( direction_pre != direction ) break; pos = 0; direction = !direction; } } f(g[pos]); } } I know PR39976 has something to do with this case, and check-in r182140 caused big degradation on the real benchmark, but I'm still confusing around this issue. The procedure is like this, Loop store motion generates code below, bb 6: D.4084_17 = l.4_13 + D.4077_70; pos.5_18 = *D.4084_17; pos_lsm.34_103 = pos.5_18; goto bb 8; bb 7: D.4088_23 = r.6_19 + D.4077_70; pos.7_24 = *D.4088_23; pos_lsm.34_104 = pos.7_24; bb 8: # prephitmp.29_89 = PHI pos.5_18(6), pos.7_24(7) # pos_lsm.34_53 = PHI pos_lsm.34_103(6), pos_lsm.34_104(7) pos.2_25 = prephitmp.29_89; if (pos.2_25 == -1) goto bb 9; else goto bb 20; And then, copy propagation transforms block 8 it into bb 8: # prephitmp.29_89 = PHI pos.5_18(11), pos.7_24(12) # pos_lsm.34_53 = PHI pos.5_18(11), pos.7_24(12) ... And then, these two duplicated PHI stmts have been kept until expand pass. In dom2, a stmt like below # pos_lsm.34_54 = PHI pos_lsm.34_53(13), 0(16) is transformed into # pos_lsm.34_54 = PHI prephitmp.29_89(13), 0(16) just because they are equal. Unfortunately, this transformation changed back-end behavior to generate redundant copy instructions and hurt performance. This case is from a real benchmark and hurt performance a lot. Does the fix in r182140 intend to totally clean up this kind of redundancy? Where should we get it fixed? Hi, sorry not to have responded sooner -- I just now got some time to look at this. I compiled your code with -O3 for revisions 182139 and 182140 of GCC trunk, and found no difference between the code produced by the middle end for the two versions. So something else has apparently come along since then that helped produce the problematic code generation for you. Either that or I need to use different compile flags; you didn't specify what you used. The fix in r182140 does just what you saw in dom2: identifies duplicate PHIs in the same block and ensures only one of them is used. This actually avoids inserting extra blocks during expand in certain loop cases. I am not sure why you are getting redundant copies as a result, but it sounds from your comments like IRA didn't coalesce a register copy or something like that. You may want to bisect revisions on the trunk to see where your bad code generation started to occur to get a better handle on what happened. As Richard said, the dom pass is likely to be removed someday, whenever someone can get around to it. My redundant-phi band-aid
Re: A question about redundant PHI expression stmt
On Tue, Feb 28, 2012 at 9:33 AM, Jiangning Liu jiangning@arm.com wrote: -Original Message- From: Jiangning Liu Sent: Tuesday, February 28, 2012 4:07 PM To: Jiangning Liu; 'William J. Schmidt' Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org Subject: RE: A question about redundant PHI expression stmt -Original Message- From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Jiangning Liu Sent: Tuesday, February 28, 2012 11:19 AM To: 'William J. Schmidt' Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org Subject: RE: A question about redundant PHI expression stmt -Original Message- From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of William J. Schmidt Sent: Monday, February 27, 2012 11:32 PM To: Jiangning Liu Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org Subject: Re: A question about redundant PHI expression stmt On Fri, 2012-02-24 at 16:07 +0800, Jiangning Liu wrote: Hi, For the small case below, there are some redundant PHI expression stmt generated, and finally cause back-end generates redundant copy instructions due to some reasons around IRA. int *l, *r, *g; void test_func(int n) { int i; static int j; static int pos, direction, direction_pre; pos = 0; direction = 1; for ( i = 0; i n; i++ ) { direction_pre = direction; for ( j = 0; j = 400; j++ ) { if ( g[pos] == 200 ) break; if ( direction == 0 ) pos = l[pos]; else pos = r[pos]; if ( pos == -1 ) { if ( direction_pre != direction ) break; pos = 0; direction = !direction; } } f(g[pos]); } } I know PR39976 has something to do with this case, and check-in r182140 caused big degradation on the real benchmark, but I'm still confusing around this issue. The procedure is like this, Loop store motion generates code below, bb 6: D.4084_17 = l.4_13 + D.4077_70; pos.5_18 = *D.4084_17; pos_lsm.34_103 = pos.5_18; goto bb 8; bb 7: D.4088_23 = r.6_19 + D.4077_70; pos.7_24 = *D.4088_23; pos_lsm.34_104 = pos.7_24; bb 8: # prephitmp.29_89 = PHI pos.5_18(6), pos.7_24(7) # pos_lsm.34_53 = PHI pos_lsm.34_103(6), pos_lsm.34_104(7) pos.2_25 = prephitmp.29_89; if (pos.2_25 == -1) goto bb 9; else goto bb 20; And then, copy propagation transforms block 8 it into bb 8: # prephitmp.29_89 = PHI pos.5_18(11), pos.7_24(12) # pos_lsm.34_53 = PHI pos.5_18(11), pos.7_24(12) ... And then, these two duplicated PHI stmts have been kept until expand pass. In dom2, a stmt like below # pos_lsm.34_54 = PHI pos_lsm.34_53(13), 0(16) is transformed into # pos_lsm.34_54 = PHI prephitmp.29_89(13), 0(16) just because they are equal. Unfortunately, this transformation changed back-end behavior to generate redundant copy instructions and hurt performance. This case is from a real benchmark and hurt performance a lot. Does the fix in r182140 intend to totally clean up this kind of redundancy? Where should we get it fixed? Hi, sorry not to have responded sooner -- I just now got some time to look at this. I compiled your code with -O3 for revisions 182139 and 182140 of GCC trunk, and found no difference between the code produced by the middle end for the two versions. So something else has apparently come along since then that helped produce the problematic code generation for you. Either that or I need to use different compile flags; you didn't specify what you used. The fix in r182140 does just what you saw in dom2: identifies duplicate PHIs in the same block and ensures only one of them is used. This actually avoids inserting extra blocks during expand in certain loop cases. I am not sure why you are getting redundant copies as a result, but it sounds from your comments like IRA didn't coalesce a register copy or something like that. You may want to bisect revisions on the trunk to see where your bad code generation started to occur to get a better handle on what happened. As Richard said, the dom pass is likely to be removed someday
Re: A question about redundant PHI expression stmt
On Tue, 2012-02-28 at 11:21 +0100, Richard Guenther wrote: On Tue, Feb 28, 2012 at 9:33 AM, Jiangning Liu jiangning@arm.com wrote: -Original Message- From: Jiangning Liu Sent: Tuesday, February 28, 2012 4:07 PM To: Jiangning Liu; 'William J. Schmidt' Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org Subject: RE: A question about redundant PHI expression stmt -Original Message- From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Jiangning Liu Sent: Tuesday, February 28, 2012 11:19 AM To: 'William J. Schmidt' Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org Subject: RE: A question about redundant PHI expression stmt -Original Message- From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of William J. Schmidt Sent: Monday, February 27, 2012 11:32 PM To: Jiangning Liu Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org Subject: Re: A question about redundant PHI expression stmt On Fri, 2012-02-24 at 16:07 +0800, Jiangning Liu wrote: Hi, For the small case below, there are some redundant PHI expression stmt generated, and finally cause back-end generates redundant copy instructions due to some reasons around IRA. int *l, *r, *g; void test_func(int n) { int i; static int j; static int pos, direction, direction_pre; pos = 0; direction = 1; for ( i = 0; i n; i++ ) { direction_pre = direction; for ( j = 0; j = 400; j++ ) { if ( g[pos] == 200 ) break; if ( direction == 0 ) pos = l[pos]; else pos = r[pos]; if ( pos == -1 ) { if ( direction_pre != direction ) break; pos = 0; direction = !direction; } } f(g[pos]); } } I know PR39976 has something to do with this case, and check-in r182140 caused big degradation on the real benchmark, but I'm still confusing around this issue. The procedure is like this, Loop store motion generates code below, bb 6: D.4084_17 = l.4_13 + D.4077_70; pos.5_18 = *D.4084_17; pos_lsm.34_103 = pos.5_18; goto bb 8; bb 7: D.4088_23 = r.6_19 + D.4077_70; pos.7_24 = *D.4088_23; pos_lsm.34_104 = pos.7_24; bb 8: # prephitmp.29_89 = PHI pos.5_18(6), pos.7_24(7) # pos_lsm.34_53 = PHI pos_lsm.34_103(6), pos_lsm.34_104(7) pos.2_25 = prephitmp.29_89; if (pos.2_25 == -1) goto bb 9; else goto bb 20; And then, copy propagation transforms block 8 it into bb 8: # prephitmp.29_89 = PHI pos.5_18(11), pos.7_24(12) # pos_lsm.34_53 = PHI pos.5_18(11), pos.7_24(12) ... And then, these two duplicated PHI stmts have been kept until expand pass. In dom2, a stmt like below # pos_lsm.34_54 = PHI pos_lsm.34_53(13), 0(16) is transformed into # pos_lsm.34_54 = PHI prephitmp.29_89(13), 0(16) just because they are equal. Unfortunately, this transformation changed back-end behavior to generate redundant copy instructions and hurt performance. This case is from a real benchmark and hurt performance a lot. Does the fix in r182140 intend to totally clean up this kind of redundancy? Where should we get it fixed? Hi, sorry not to have responded sooner -- I just now got some time to look at this. I compiled your code with -O3 for revisions 182139 and 182140 of GCC trunk, and found no difference between the code produced by the middle end for the two versions. So something else has apparently come along since then that helped produce the problematic code generation for you. Either that or I need to use different compile flags; you didn't specify what you used. The fix in r182140 does just what you saw in dom2: identifies duplicate PHIs in the same block and ensures only one of them is used. This actually avoids inserting extra blocks during expand in certain loop cases. I am not sure why you are getting redundant copies as a result, but it sounds from your comments like IRA didn't coalesce a register copy
Re: A question about redundant PHI expression stmt
On Tue, 2012-02-28 at 11:03 -0600, William J. Schmidt wrote: I think this is probably a problem with how cprop_into_successor_phis works. It only propagates into immediate successors of a block. In this case copies are propagated from bb12 into phis in bb13 and bb14 (of which there are none). Seems like it could propagate into phis of all dominator children, which would catch bb15. I haven't looked at this part of the code in too much detail, but I'm guessing since bb15's immediate dominator is bb12, this is the last chance to get the value propagated into the phi. When I get more time, I'll walk through the logic for this test to confirm, and post here with the results. If this is the problem, then I doubt anything will be fixed for this in the limited time left for 4.7. You'd want to open a missed-optimization PR for it. No, this isn't right. I see in the dump that the copy is being removed from the const_and_copies_table after processing block 13 and before processing block 14. That's presumably a bug. I'll keep looking. Bill
Re: A question about redundant PHI expression stmt
On Tue, 2012-02-28 at 11:52 -0600, William J. Schmidt wrote: On Tue, 2012-02-28 at 11:03 -0600, William J. Schmidt wrote: I think this is probably a problem with how cprop_into_successor_phis works. It only propagates into immediate successors of a block. In this case copies are propagated from bb12 into phis in bb13 and bb14 (of which there are none). Seems like it could propagate into phis of all dominator children, which would catch bb15. I haven't looked at this part of the code in too much detail, but I'm guessing since bb15's immediate dominator is bb12, this is the last chance to get the value propagated into the phi. When I get more time, I'll walk through the logic for this test to confirm, and post here with the results. If this is the problem, then I doubt anything will be fixed for this in the limited time left for 4.7. You'd want to open a missed-optimization PR for it. No, this isn't right. I see in the dump that the copy is being removed from the const_and_copies_table after processing block 13 and before processing block 14. That's presumably a bug. I'll keep looking. OK, I see the problem. We have a long-standing bug in dom's use of edge threading, which quietly corrupts the const_and_copy stack. I'll create a bug report and post a patch for you to try out. My testing shows that the redundant phi is properly handled now, but let's see if it solves your performance issue. Thanks, Bill Bill
Re: A question about redundant PHI expression stmt
On Fri, 2012-02-24 at 16:07 +0800, Jiangning Liu wrote: Hi, For the small case below, there are some redundant PHI expression stmt generated, and finally cause back-end generates redundant copy instructions due to some reasons around IRA. int *l, *r, *g; void test_func(int n) { int i; static int j; static int pos, direction, direction_pre; pos = 0; direction = 1; for ( i = 0; i n; i++ ) { direction_pre = direction; for ( j = 0; j = 400; j++ ) { if ( g[pos] == 200 ) break; if ( direction == 0 ) pos = l[pos]; else pos = r[pos]; if ( pos == -1 ) { if ( direction_pre != direction ) break; pos = 0; direction = !direction; } } f(g[pos]); } } I know PR39976 has something to do with this case, and check-in r182140 caused big degradation on the real benchmark, but I'm still confusing around this issue. The procedure is like this, Loop store motion generates code below, bb 6: D.4084_17 = l.4_13 + D.4077_70; pos.5_18 = *D.4084_17; pos_lsm.34_103 = pos.5_18; goto bb 8; bb 7: D.4088_23 = r.6_19 + D.4077_70; pos.7_24 = *D.4088_23; pos_lsm.34_104 = pos.7_24; bb 8: # prephitmp.29_89 = PHI pos.5_18(6), pos.7_24(7) # pos_lsm.34_53 = PHI pos_lsm.34_103(6), pos_lsm.34_104(7) pos.2_25 = prephitmp.29_89; if (pos.2_25 == -1) goto bb 9; else goto bb 20; And then, copy propagation transforms block 8 it into bb 8: # prephitmp.29_89 = PHI pos.5_18(11), pos.7_24(12) # pos_lsm.34_53 = PHI pos.5_18(11), pos.7_24(12) ... And then, these two duplicated PHI stmts have been kept until expand pass. In dom2, a stmt like below # pos_lsm.34_54 = PHI pos_lsm.34_53(13), 0(16) is transformed into # pos_lsm.34_54 = PHI prephitmp.29_89(13), 0(16) just because they are equal. Unfortunately, this transformation changed back-end behavior to generate redundant copy instructions and hurt performance. This case is from a real benchmark and hurt performance a lot. Does the fix in r182140 intend to totally clean up this kind of redundancy? Where should we get it fixed? Hi, sorry not to have responded sooner -- I just now got some time to look at this. I compiled your code with -O3 for revisions 182139 and 182140 of GCC trunk, and found no difference between the code produced by the middle end for the two versions. So something else has apparently come along since then that helped produce the problematic code generation for you. Either that or I need to use different compile flags; you didn't specify what you used. The fix in r182140 does just what you saw in dom2: identifies duplicate PHIs in the same block and ensures only one of them is used. This actually avoids inserting extra blocks during expand in certain loop cases. I am not sure why you are getting redundant copies as a result, but it sounds from your comments like IRA didn't coalesce a register copy or something like that. You may want to bisect revisions on the trunk to see where your bad code generation started to occur to get a better handle on what happened. As Richard said, the dom pass is likely to be removed someday, whenever someone can get around to it. My redundant-phi band-aid in dom would go away then as well, but presumably an extra pass of PRE would replace it, and redundant PHIs would still be removed. Thanks, Bill Thanks, -Jiangning
RE: A question about redundant PHI expression stmt
-Original Message- From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of William J. Schmidt Sent: Monday, February 27, 2012 11:32 PM To: Jiangning Liu Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org Subject: Re: A question about redundant PHI expression stmt On Fri, 2012-02-24 at 16:07 +0800, Jiangning Liu wrote: Hi, For the small case below, there are some redundant PHI expression stmt generated, and finally cause back-end generates redundant copy instructions due to some reasons around IRA. int *l, *r, *g; void test_func(int n) { int i; static int j; static int pos, direction, direction_pre; pos = 0; direction = 1; for ( i = 0; i n; i++ ) { direction_pre = direction; for ( j = 0; j = 400; j++ ) { if ( g[pos] == 200 ) break; if ( direction == 0 ) pos = l[pos]; else pos = r[pos]; if ( pos == -1 ) { if ( direction_pre != direction ) break; pos = 0; direction = !direction; } } f(g[pos]); } } I know PR39976 has something to do with this case, and check-in r182140 caused big degradation on the real benchmark, but I'm still confusing around this issue. The procedure is like this, Loop store motion generates code below, bb 6: D.4084_17 = l.4_13 + D.4077_70; pos.5_18 = *D.4084_17; pos_lsm.34_103 = pos.5_18; goto bb 8; bb 7: D.4088_23 = r.6_19 + D.4077_70; pos.7_24 = *D.4088_23; pos_lsm.34_104 = pos.7_24; bb 8: # prephitmp.29_89 = PHI pos.5_18(6), pos.7_24(7) # pos_lsm.34_53 = PHI pos_lsm.34_103(6), pos_lsm.34_104(7) pos.2_25 = prephitmp.29_89; if (pos.2_25 == -1) goto bb 9; else goto bb 20; And then, copy propagation transforms block 8 it into bb 8: # prephitmp.29_89 = PHI pos.5_18(11), pos.7_24(12) # pos_lsm.34_53 = PHI pos.5_18(11), pos.7_24(12) ... And then, these two duplicated PHI stmts have been kept until expand pass. In dom2, a stmt like below # pos_lsm.34_54 = PHI pos_lsm.34_53(13), 0(16) is transformed into # pos_lsm.34_54 = PHI prephitmp.29_89(13), 0(16) just because they are equal. Unfortunately, this transformation changed back-end behavior to generate redundant copy instructions and hurt performance. This case is from a real benchmark and hurt performance a lot. Does the fix in r182140 intend to totally clean up this kind of redundancy? Where should we get it fixed? Hi, sorry not to have responded sooner -- I just now got some time to look at this. I compiled your code with -O3 for revisions 182139 and 182140 of GCC trunk, and found no difference between the code produced by the middle end for the two versions. So something else has apparently come along since then that helped produce the problematic code generation for you. Either that or I need to use different compile flags; you didn't specify what you used. The fix in r182140 does just what you saw in dom2: identifies duplicate PHIs in the same block and ensures only one of them is used. This actually avoids inserting extra blocks during expand in certain loop cases. I am not sure why you are getting redundant copies as a result, but it sounds from your comments like IRA didn't coalesce a register copy or something like that. You may want to bisect revisions on the trunk to see where your bad code generation started to occur to get a better handle on what happened. As Richard said, the dom pass is likely to be removed someday, whenever someone can get around to it. My redundant-phi band-aid in dom would go away then as well, but presumably an extra pass of PRE would replace it, and redundant PHIs would still be removed. Bill, Thanks a lot for your explanation! The bug could be exposed with -O2 if you apply the check-in r184435 made by Richard. BTW, at present is there anybody working on the NEW pass replacing dom? If no, in short term, it would be good to still get it fixed just because it is hurting benchmark. If Yes, may I know what that NEW pass will be focusing on? Thanks, -Jiangning Thanks, Bill Thanks, -Jiangning
A question about redundant PHI expression stmt
Hi, For the small case below, there are some redundant PHI expression stmt generated, and finally cause back-end generates redundant copy instructions due to some reasons around IRA. int *l, *r, *g; void test_func(int n) { int i; static int j; static int pos, direction, direction_pre; pos = 0; direction = 1; for ( i = 0; i n; i++ ) { direction_pre = direction; for ( j = 0; j = 400; j++ ) { if ( g[pos] == 200 ) break; if ( direction == 0 ) pos = l[pos]; else pos = r[pos]; if ( pos == -1 ) { if ( direction_pre != direction ) break; pos = 0; direction = !direction; } } f(g[pos]); } } I know PR39976 has something to do with this case, and check-in r182140 caused big degradation on the real benchmark, but I'm still confusing around this issue. The procedure is like this, Loop store motion generates code below, bb 6: D.4084_17 = l.4_13 + D.4077_70; pos.5_18 = *D.4084_17; pos_lsm.34_103 = pos.5_18; goto bb 8; bb 7: D.4088_23 = r.6_19 + D.4077_70; pos.7_24 = *D.4088_23; pos_lsm.34_104 = pos.7_24; bb 8: # prephitmp.29_89 = PHI pos.5_18(6), pos.7_24(7) # pos_lsm.34_53 = PHI pos_lsm.34_103(6), pos_lsm.34_104(7) pos.2_25 = prephitmp.29_89; if (pos.2_25 == -1) goto bb 9; else goto bb 20; And then, copy propagation transforms block 8 it into bb 8: # prephitmp.29_89 = PHI pos.5_18(11), pos.7_24(12) # pos_lsm.34_53 = PHI pos.5_18(11), pos.7_24(12) ... And then, these two duplicated PHI stmts have been kept until expand pass. In dom2, a stmt like below # pos_lsm.34_54 = PHI pos_lsm.34_53(13), 0(16) is transformed into # pos_lsm.34_54 = PHI prephitmp.29_89(13), 0(16) just because they are equal. Unfortunately, this transformation changed back-end behavior to generate redundant copy instructions and hurt performance. This case is from a real benchmark and hurt performance a lot. Does the fix in r182140 intend to totally clean up this kind of redundancy? Where should we get it fixed? Thanks, -Jiangning
Re: A question about redundant PHI expression stmt
On Fri, Feb 24, 2012 at 9:07 AM, Jiangning Liu jiangning@arm.com wrote: Hi, For the small case below, there are some redundant PHI expression stmt generated, and finally cause back-end generates redundant copy instructions due to some reasons around IRA. int *l, *r, *g; void test_func(int n) { int i; static int j; static int pos, direction, direction_pre; pos = 0; direction = 1; for ( i = 0; i n; i++ ) { direction_pre = direction; for ( j = 0; j = 400; j++ ) { if ( g[pos] == 200 ) break; if ( direction == 0 ) pos = l[pos]; else pos = r[pos]; if ( pos == -1 ) { if ( direction_pre != direction ) break; pos = 0; direction = !direction; } } f(g[pos]); } } I know PR39976 has something to do with this case, and check-in r182140 caused big degradation on the real benchmark, but I'm still confusing around this issue. The procedure is like this, Loop store motion generates code below, bb 6: D.4084_17 = l.4_13 + D.4077_70; pos.5_18 = *D.4084_17; pos_lsm.34_103 = pos.5_18; goto bb 8; bb 7: D.4088_23 = r.6_19 + D.4077_70; pos.7_24 = *D.4088_23; pos_lsm.34_104 = pos.7_24; bb 8: # prephitmp.29_89 = PHI pos.5_18(6), pos.7_24(7) # pos_lsm.34_53 = PHI pos_lsm.34_103(6), pos_lsm.34_104(7) pos.2_25 = prephitmp.29_89; if (pos.2_25 == -1) goto bb 9; else goto bb 20; And then, copy propagation transforms block 8 it into bb 8: # prephitmp.29_89 = PHI pos.5_18(11), pos.7_24(12) # pos_lsm.34_53 = PHI pos.5_18(11), pos.7_24(12) ... And then, these two duplicated PHI stmts have been kept until expand pass. In dom2, a stmt like below # pos_lsm.34_54 = PHI pos_lsm.34_53(13), 0(16) is transformed into # pos_lsm.34_54 = PHI prephitmp.29_89(13), 0(16) just because they are equal. Unfortunately, this transformation changed back-end behavior to generate redundant copy instructions and hurt performance. This case is from a real benchmark and hurt performance a lot. Does the fix in r182140 intend to totally clean up this kind of redundancy? Where should we get it fixed? FRE/PRE are currently the only passes that remove redundant PHI nodes. DOM can be trivially extended to do the same (I _think_ I had a patch for this somewhere ... but in the end I think DOM should go, one VN is enough). Richard. Thanks, -Jiangning