RE: A case exposing code sink issue
In final value replacement, expression a + D. can be figured out, while a[i_xxx] failed to be CHRECed, so I'm wondering if we should lower a[i_xxx] to a + unitsize(a) * i_xxx first? It seems GCC intends to keep a[i_xxx] until cfgexpand pass. Richard, Thanks a lot for your explanation! Any comments for this question as well? Does GCC intends to keep a[i] until expand pass? Any special reason? If I change CHREC algorithm, I see ivopt would have done this lowering, so we wouldn't see a[i] in expand pass. Any hurt? Thanks, -Jiangning
Re: A case exposing code sink issue
On Thu, Jan 5, 2012 at 9:34 AM, Jiangning Liu jiangning@arm.com wrote: In final value replacement, expression a + D. can be figured out, while a[i_xxx] failed to be CHRECed, so I'm wondering if we should lower a[i_xxx] to a + unitsize(a) * i_xxx first? It seems GCC intends to keep a[i_xxx] until cfgexpand pass. Richard, Thanks a lot for your explanation! Any comments for this question as well? Does GCC intends to keep a[i] until expand pass? It's kept as it suits - it's shorter and easier to analyze (we know the implicit multiplication won't overflow) Any special reason? If I change CHREC algorithm, I see ivopt would have done this lowering, so we wouldn't see a[i] in expand pass. Any hurt? No, I think changing the CHREC algorithm is ok (I didn't look closely at your patch). This is stage1 material though. Thanks, Richard. Thanks, -Jiangning
RE: A case exposing code sink issue
-Original Message- From: Richard Guenther [mailto:richard.guent...@gmail.com] Sent: Thursday, January 05, 2012 5:32 PM To: Jiangning Liu Cc: Michael Matz; gcc@gcc.gnu.org Subject: Re: A case exposing code sink issue On Thu, Jan 5, 2012 at 9:34 AM, Jiangning Liu jiangning@arm.com wrote: In final value replacement, expression a + D. can be figured out, while a[i_xxx] failed to be CHRECed, so I'm wondering if we should lower a[i_xxx] to a + unitsize(a) * i_xxx first? It seems GCC intends to keep a[i_xxx] until cfgexpand pass. Richard, Thanks a lot for your explanation! Any comments for this question as well? Does GCC intends to keep a[i] until expand pass? It's kept as it suits - it's shorter and easier to analyze (we know the implicit multiplication won't overflow) Any special reason? If I change CHREC algorithm, I see ivopt would have done this lowering, so we wouldn't see a[i] in expand pass. Any hurt? No, I think changing the CHREC algorithm is ok (I didn't look closely at your patch). This is stage1 material though. Actually I didn't send it out at all. :-) And I just send out a RFC here http://gcc.gnu.org/ml/gcc-patches/2012-01/msg00236.html, can you have a look? Thanks, Richard. Thanks, -Jiangning
Re: A case exposing code sink issue
On Thu, Dec 29, 2011 at 9:50 AM, Jiangning Liu jiangning@arm.com wrote: -Original Message- From: Jiangning Liu Sent: Wednesday, December 28, 2011 5:38 PM To: Jiangning Liu; 'Richard Guenther' Cc: Michael Matz; gcc@gcc.gnu.org Subject: RE: A case exposing code sink issue -Original Message- From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Jiangning Liu Sent: Tuesday, December 27, 2011 5:10 PM To: 'Richard Guenther' Cc: Michael Matz; gcc@gcc.gnu.org Subject: RE: A case exposing code sink issue The job to do this is final value replacement, not sinking (we do not sink non-invariant expressions - you'd have to translate them through the loop-closed SSA exit PHI node, certainly doable, patches welcome ;)). Richard, In final value replacement, expression a + D. can be figured out, while a[i_xxx] failed to be CHRECed, so I'm wondering if we should lower a[i_xxx] to a + unitsize(a) * i_xxx first? It seems GCC intends to keep a[i_xxx] until cfgexpand pass. Or we have to directly modify CHREC algorithm to get it calculated? Appreciate your kindly help in advance! Richard, Now I have a patch working for the case of step i++, by directly modifying scalar evolution algorithm. the following code would be generated after SCCP, l # i_13 = PHI i_6(7), k_2(D)(4) a_p.0_4 = a[i_13]; MEM[(int *)a][i_13] = 100; i_6 = i_13 + 1; if (i_6 = 999) goto bb 7; else goto bb 6; bb 6: a_p_lsm.5_11 = MEM[(void *)a + 3996B]; a_p = a_p_lsm.5_11; goto bb 3; It looks good, but I still have problem when the case has step i+=k. For this case the value of variable i exiting loop isn't invariant, the algorithm below in scalar evolution doesn't work on it, compute_overall_effect_of_inner_loop() { ... tree nb_iter = number_of_latch_executions (inner_loop); if (nb_iter == chrec_dont_know) return chrec_dont_know; else { tree res; /* evolution_fn is the evolution function in LOOP. Get its value in the nb_iter-th iteration. */ res = chrec_apply (inner_loop-num, evolution_fn, nb_iter); if (chrec_contains_symbols_defined_in_loop (res, loop-num)) res = instantiate_parameters (loop, res); /* Continue the computation until ending on a parent of LOOP. */ return compute_overall_effect_of_inner_loop (loop, res); } } In theory, we can still have the transformation like below even if the step is i+=k, # i_13 = PHI i_6(7), k_2(D)(4) i_14 = i_13, a_p.0_4 = a[i_13]; MEM[(int *)a][i_13] = 100; i_6 = i_13 + k_2(D); // i+=k if (i_6 = 999) goto bb 7; else goto bb 6; bb 6: a_p_lsm.5_11 = a[i_14]; a_p = a_p_lsm.5_11; goto bb 3; But I realize this is not a loop closed SSA form at all, because i_14 is being used out of the loop. Where could we extend the liverange of variable i in GCC infrastructure and finally solve this problem? It seems many people are still in the happy of the upcoming 2012 New Year. :-) Following my story, I find the following code in tree-ssa-copy.c /* Avoid copy propagation from an inner into an outer loop. Otherwise, this may move loop variant variables outside of their loops and prevent coalescing opportunities. If the value was loop invariant, it will be hoisted by LICM and exposed for copy propagation. ??? The value will be always loop invariant. In loop-closed SSA form do not copy-propagate through PHI nodes in blocks with a loop exit edge predecessor. */ if (current_loops TREE_CODE (arg_value) == SSA_NAME (loop_depth_of_name (arg_value) loop_depth_of_name (lhs) || (loops_state_satisfies_p (LOOP_CLOSED_SSA) loop_exit_edge_p (e-src-loop_father, e { phi_val.value = lhs; break; } Here http://gcc.gnu.org/ml/gcc-patches/2006-12/msg00066.html, Dan said The original check was not because of coalescing, but because we would copy prop in-loop variables outside the loop, causing *more* invariantness in nested loops. Can anybody give me a concrete example to explain this statement? I can neither make sense of the code nor of Dans comment. But of course the part about loop-closed-SSA is still true. I _suppose_ that the situation is like tem = stuff; for () x = tem; foo = x; and Dan expects LIM to hoist x = tem (instead of sinking it by copyprop). Then we'd coalesce tem and x (but it would be still live across the loop), instead of coalescing x and foo (keeping tem live over the loop). LIM of course does not hoist this kind of stuff because it appears too cheap. Anyway, for my simple case, I don't see bad thing would happen when propagate a[i] out
RE: A case exposing code sink issue
-Original Message- From: Jiangning Liu Sent: Wednesday, December 28, 2011 5:38 PM To: Jiangning Liu; 'Richard Guenther' Cc: Michael Matz; gcc@gcc.gnu.org Subject: RE: A case exposing code sink issue -Original Message- From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Jiangning Liu Sent: Tuesday, December 27, 2011 5:10 PM To: 'Richard Guenther' Cc: Michael Matz; gcc@gcc.gnu.org Subject: RE: A case exposing code sink issue The job to do this is final value replacement, not sinking (we do not sink non-invariant expressions - you'd have to translate them through the loop-closed SSA exit PHI node, certainly doable, patches welcome ;)). Richard, In final value replacement, expression a + D. can be figured out, while a[i_xxx] failed to be CHRECed, so I'm wondering if we should lower a[i_xxx] to a + unitsize(a) * i_xxx first? It seems GCC intends to keep a[i_xxx] until cfgexpand pass. Or we have to directly modify CHREC algorithm to get it calculated? Appreciate your kindly help in advance! Richard, Now I have a patch working for the case of step i++, by directly modifying scalar evolution algorithm. the following code would be generated after SCCP, l # i_13 = PHI i_6(7), k_2(D)(4) a_p.0_4 = a[i_13]; MEM[(int *)a][i_13] = 100; i_6 = i_13 + 1; if (i_6 = 999) goto bb 7; else goto bb 6; bb 6: a_p_lsm.5_11 = MEM[(void *)a + 3996B]; a_p = a_p_lsm.5_11; goto bb 3; It looks good, but I still have problem when the case has step i+=k. For this case the value of variable i exiting loop isn't invariant, the algorithm below in scalar evolution doesn't work on it, compute_overall_effect_of_inner_loop() { ... tree nb_iter = number_of_latch_executions (inner_loop); if (nb_iter == chrec_dont_know) return chrec_dont_know; else { tree res; /* evolution_fn is the evolution function in LOOP. Get its value in the nb_iter-th iteration. */ res = chrec_apply (inner_loop-num, evolution_fn, nb_iter); if (chrec_contains_symbols_defined_in_loop (res, loop-num)) res = instantiate_parameters (loop, res); /* Continue the computation until ending on a parent of LOOP. */ return compute_overall_effect_of_inner_loop (loop, res); } } In theory, we can still have the transformation like below even if the step is i+=k, # i_13 = PHI i_6(7), k_2(D)(4) i_14 = i_13, a_p.0_4 = a[i_13]; MEM[(int *)a][i_13] = 100; i_6 = i_13 + k_2(D); // i+=k if (i_6 = 999) goto bb 7; else goto bb 6; bb 6: a_p_lsm.5_11 = a[i_14]; a_p = a_p_lsm.5_11; goto bb 3; But I realize this is not a loop closed SSA form at all, because i_14 is being used out of the loop. Where could we extend the liverange of variable i in GCC infrastructure and finally solve this problem? It seems many people are still in the happy of the upcoming 2012 New Year. :-) Following my story, I find the following code in tree-ssa-copy.c /* Avoid copy propagation from an inner into an outer loop. Otherwise, this may move loop variant variables outside of their loops and prevent coalescing opportunities. If the value was loop invariant, it will be hoisted by LICM and exposed for copy propagation. ??? The value will be always loop invariant. In loop-closed SSA form do not copy-propagate through PHI nodes in blocks with a loop exit edge predecessor. */ if (current_loops TREE_CODE (arg_value) == SSA_NAME (loop_depth_of_name (arg_value) loop_depth_of_name (lhs) || (loops_state_satisfies_p (LOOP_CLOSED_SSA) loop_exit_edge_p (e-src-loop_father, e { phi_val.value = lhs; break; } Here http://gcc.gnu.org/ml/gcc-patches/2006-12/msg00066.html, Dan said The original check was not because of coalescing, but because we would copy prop in-loop variables outside the loop, causing *more* invariantness in nested loops. Can anybody give me a concrete example to explain this statement? Anyway, for my simple case, I don't see bad thing would happen when propagate a[i] out of the loop. Also, after this propagation, a_p.0_4 = a[i_13]; within the loop would be dead code and removed in the passes afterwards. In my opinion, that way the computation would be reduced in the loop. Did I make any mistake? Thanks, -Jiangning
RE: A case exposing code sink issue
-Original Message- From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Jiangning Liu Sent: Tuesday, December 27, 2011 5:10 PM To: 'Richard Guenther' Cc: Michael Matz; gcc@gcc.gnu.org Subject: RE: A case exposing code sink issue The job to do this is final value replacement, not sinking (we do not sink non-invariant expressions - you'd have to translate them through the loop-closed SSA exit PHI node, certainly doable, patches welcome ;)). Richard, In final value replacement, expression a + D. can be figured out, while a[i_xxx] failed to be CHRECed, so I'm wondering if we should lower a[i_xxx] to a + unitsize(a) * i_xxx first? It seems GCC intends to keep a[i_xxx] until cfgexpand pass. Or we have to directly modify CHREC algorithm to get it calculated? Appreciate your kindly help in advance! Richard, Now I have a patch working for the case of step i++, by directly modifying scalar evolution algorithm. the following code would be generated after SCCP, l # i_13 = PHI i_6(7), k_2(D)(4) a_p.0_4 = a[i_13]; MEM[(int *)a][i_13] = 100; i_6 = i_13 + 1; if (i_6 = 999) goto bb 7; else goto bb 6; bb 6: a_p_lsm.5_11 = MEM[(void *)a + 3996B]; a_p = a_p_lsm.5_11; goto bb 3; It looks good, but I still have problem when the case has step i+=k. For this case the value of variable i exiting loop isn't invariant, the algorithm below in scalar evolution doesn't work on it, compute_overall_effect_of_inner_loop() { ... tree nb_iter = number_of_latch_executions (inner_loop); if (nb_iter == chrec_dont_know) return chrec_dont_know; else { tree res; /* evolution_fn is the evolution function in LOOP. Get its value in the nb_iter-th iteration. */ res = chrec_apply (inner_loop-num, evolution_fn, nb_iter); if (chrec_contains_symbols_defined_in_loop (res, loop-num)) res = instantiate_parameters (loop, res); /* Continue the computation until ending on a parent of LOOP. */ return compute_overall_effect_of_inner_loop (loop, res); } } In theory, we can still have the transformation like below even if the step is i+=k, # i_13 = PHI i_6(7), k_2(D)(4) i_14 = i_13, a_p.0_4 = a[i_13]; MEM[(int *)a][i_13] = 100; i_6 = i_13 + k_2(D); // i+=k if (i_6 = 999) goto bb 7; else goto bb 6; bb 6: a_p_lsm.5_11 = a[i_14]; a_p = a_p_lsm.5_11; goto bb 3; But I realize this is not a loop closed SSA form at all, because i_14 is being used out of the loop. Where could we extend the liverange of variable i in GCC infrastructure and finally solve this problem? Thanks, -Jiangning
RE: A case exposing code sink issue
The job to do this is final value replacement, not sinking (we do not sink non-invariant expressions - you'd have to translate them through the loop-closed SSA exit PHI node, certainly doable, patches welcome ;)). Richard, In final value replacement, expression a + D. can be figured out, while a[i_xxx] failed to be CHRECed, so I'm wondering if we should lower a[i_xxx] to a + unitsize(a) * i_xxx first? It seems GCC intends to keep a[i_xxx] until cfgexpand pass. Or we have to directly modify CHREC algorithm to get it calculated? Appreciate your kindly help in advance! Thanks, -Jiangning
RE: A case exposing code sink issue
Yes, the number of iterations of the i loop simply is too difficult for our loop iteration calculator to comprehend: for (i=k; i500; i+=k) iterates for roundup((500-k)/k) time. In particular if the step is non-constant our nr-of-iteration calculator gives up. I'm trying to give an even smaller case, int a[512] ; int *a_p ; int f(int k) { int i ; for(i=0; ik; i++) { a_p = a[i] ; *a_p = 7 ; } } For this case, we have a very simple loop step i++, then we would have the GIMPLE before expand like below, bb 5: # i_13 = PHI i_6(5), 0(4) # ivtmp.10_9 = PHI ivtmp.10_1(5), ivtmp.10_15(4) a_p_lsm.6_4 = a[i_13]; ivtmp.10_1 = ivtmp.10_9 + 4; D.4085_16 = (void *) ivtmp.10_1; MEM[base: D.4085_16, offset: 0B] = 7; i_6 = i_13 + 1; if (i_6 != k_3(D)) goto bb 5; else goto bb 6; bb 6: # a_p_lsm.6_11 = PHI a_p_lsm.6_4(5) a_p = a_p_lsm.6_11; goto bb 3; Why can't we still sunk a[i_13] out of loop? For example, I expect to generate the code like below, bb 5: # i_13 = PHI i_6(5), 0(4) # ivtmp.10_9 = PHI ivtmp.10_1(5), ivtmp.10_15(4) i_14 = i_13; ivtmp.10_1 = ivtmp.10_9 + 4; D.4085_16 = (void *) ivtmp.10_1; MEM[base: D.4085_16, offset: 0B] = 7; i_6 = i_13 + 1; if (i_6 != k_3(D)) goto bb 5; else goto bb 6; bb 6: # a_p_lsm.6_11 = PHI a_p_lsm.6_4(5) a_p_lsm.6_4 = a[i_14]; a_p = a_p_lsm.6_11; goto bb 3; This way the computation of a[i] would be saved within the loop. Any idea? Thanks, -Jiangning
Re: A case exposing code sink issue
On Thu, Dec 22, 2011 at 9:25 AM, Jiangning Liu jiangning@arm.com wrote: Yes, the number of iterations of the i loop simply is too difficult fo our loop iteration calculator to comprehend: for (i=k; i500; i+=k) iterates for roundup((500-k)/k) time. In particular if the step is non-constant our nr-of-iteration calculator gives up. I'm trying to give an even smaller case, int a[512] ; int *a_p ; int f(int k) { int i ; for(i=0; ik; i++) { a_p = a[i] ; *a_p = 7 ; } } For this case, we have a very simple loop step i++, then we would have the GIMPLE before expand like below, bb 5: # i_13 = PHI i_6(5), 0(4) # ivtmp.10_9 = PHI ivtmp.10_1(5), ivtmp.10_15(4) a_p_lsm.6_4 = a[i_13]; ivtmp.10_1 = ivtmp.10_9 + 4; D.4085_16 = (void *) ivtmp.10_1; MEM[base: D.4085_16, offset: 0B] = 7; i_6 = i_13 + 1; if (i_6 != k_3(D)) goto bb 5; else goto bb 6; bb 6: # a_p_lsm.6_11 = PHI a_p_lsm.6_4(5) a_p = a_p_lsm.6_11; goto bb 3; Why can't we still sunk a[i_13] out of loop? For example, I expect to generate the code like below, bb 5: # i_13 = PHI i_6(5), 0(4) # ivtmp.10_9 = PHI ivtmp.10_1(5), ivtmp.10_15(4) i_14 = i_13; ivtmp.10_1 = ivtmp.10_9 + 4; D.4085_16 = (void *) ivtmp.10_1; MEM[base: D.4085_16, offset: 0B] = 7; i_6 = i_13 + 1; if (i_6 != k_3(D)) goto bb 5; else goto bb 6; bb 6: # a_p_lsm.6_11 = PHI a_p_lsm.6_4(5) a_p_lsm.6_4 = a[i_14]; a_p = a_p_lsm.6_11; goto bb 3; This way the computation of a[i] would be saved within the loop. Any idea? The job to do this is final value replacement, not sinking (we do not sink non-invariant expressions - you'd have to translate them through the loop-closed SSA exit PHI node, certainly doable, patches welcome ;)). Richard. Thanks, -Jiangning
RE: A case exposing code sink issue
-Original Message- From: Michael Matz [mailto:m...@suse.de] Sent: Monday, November 28, 2011 9:07 PM To: Jiangning Liu Cc: gcc@gcc.gnu.org Subject: RE: A case exposing code sink issue Hi, On Mon, 28 Nov 2011, Jiangning Liu wrote: One more question... Can i = i.6_18; be sinked out of loop, because it doesn't have memory dependence with others? With current trunk the stores to i, a_p, b_p and k are sunken after the loop. (There are no aliasing problems because the decls can't conflict). What isn't sunken is the calculation of the a[D.2248_7] expression. First, the number of iterations of the inner loop can't be determined by current code (replacing i+=k with e.g. i++ could be handled for instance). Hi Michael, Do you know what the essential problem is in the case of loop iteration uncertainty? Yes, the number of iterations of the i loop simply is too difficult for our loop iteration calculator to comprehend: for (i=k; i500; i+=k) iterates for roundup((500-k)/k) time. In particular if the step is non-constant our nr-of-iteration calculator gives up. So do you think this can be improved somewhere? For this case, looking at the result in middle end, a_p.2_8 = a[D.2248_7]; should be able to sunken out of loop. That way the computation of a[D.2248_7] would be saved in loop, although the consequence is the liverange of D.2248_7 is longer and it needs to live out of loop. But anyway the register pressure would be decreased within the loop, and we would less possibly have spill/fill code. This is what I want. I think we can simply use loop induction variable analysis to solve this problem. Do you think so? Thanks, -Jiangning I thought it was still an aliasing problem. No. All accesses are resolved to final objects (i.e. no pointers), and hence can be trivially disambiguated. Ciao, Michael.
RE: A case exposing code sink issue
Hi, On Mon, 28 Nov 2011, Jiangning Liu wrote: One more question... Can i = i.6_18; be sinked out of loop, because it doesn't have memory dependence with others? With current trunk the stores to i, a_p, b_p and k are sunken after the loop. (There are no aliasing problems because the decls can't conflict). What isn't sunken is the calculation of the a[D.2248_7] expression. First, the number of iterations of the inner loop can't be determined by current code (replacing i+=k with e.g. i++ could be handled for instance). Hi Michael, Do you know what the essential problem is in the case of loop iteration uncertainty? Yes, the number of iterations of the i loop simply is too difficult for our loop iteration calculator to comprehend: for (i=k; i500; i+=k) iterates for roundup((500-k)/k) time. In particular if the step is non-constant our nr-of-iteration calculator gives up. I thought it was still an aliasing problem. No. All accesses are resolved to final objects (i.e. no pointers), and hence can be trivially disambiguated. Ciao, Michael.
RE: A case exposing code sink issue
-Original Message- From: Michael Matz [mailto:m...@suse.de] Sent: Friday, November 25, 2011 11:23 PM To: Jiangning Liu Cc: gcc@gcc.gnu.org Subject: RE: A case exposing code sink issue Hi, On Thu, 24 Nov 2011, Jiangning Liu wrote: One more question... Can i = i.6_18; be sinked out of loop, because it doesn't have memory dependence with others? With current trunk the stores to i, a_p, b_p and k are sunken after the loop. (There are no aliasing problems because the decls can't conflict). What isn't sunken is the calculation of the a[D.2248_7] expression. First, the number of iterations of the inner loop can't be determined by current code (replacing i+=k with e.g. i++ could be handled for instance). Hi Michael, Do you know what the essential problem is in the case of loop iteration uncertainty? I thought it was still an aliasing problem. Thanks, -Jiangning Then this code could be handled by final value replacement, but isn't because interpret_rhs_expr doesn't deal with ADDR_EXPR of ARRAY_REFs. Ciao, Michael.
RE: A case exposing code sink issue
Hi, On Thu, 24 Nov 2011, Jiangning Liu wrote: One more question... Can i = i.6_18; be sinked out of loop, because it doesn't have memory dependence with others? With current trunk the stores to i, a_p, b_p and k are sunken after the loop. (There are no aliasing problems because the decls can't conflict). What isn't sunken is the calculation of the a[D.2248_7] expression. First, the number of iterations of the inner loop can't be determined by current code (replacing i+=k with e.g. i++ could be handled for instance). Then this code could be handled by final value replacement, but isn't because interpret_rhs_expr doesn't deal with ADDR_EXPR of ARRAY_REFs. Ciao, Michael.
Re: A case exposing code sink issue
On Wed, Nov 23, 2011 at 8:05 PM, Jiangning Liu jiangning@arm.com wrote: If this is the root cause, which optimization pass in GCC take the role to sink them out of loop? How should we get it fixed? lim1 handles the case just fine for me. lim1 is the first loop pass. After lim1 I get: bb 4: # i.1_34 = PHI i.6_18(9), k.0_9(7) D.2934_5 = pretmp.11_33; D.2936_7 = i.1_34 + D.2934_5; a_p.2_8 = a[D.2936_7]; a_p_lsm.13_37 = a_p.2_8; b_p.3_13 = b[D.2936_7]; b_p_lsm.14_38 = b_p.3_13; MEM[(int *)a][D.2936_7] = 7; MEM[(int *)b][D.2936_7] = 7; i.6_18 = k.0_9 + i.1_34; i_lsm.12_39 = i.6_18; if (i.6_18 = 511) goto bb 9; else goto bb 8; bb 9: goto bb 4; Thanks, Andrew Pinski
RE: A case exposing code sink issue
-Original Message- From: Andrew Pinski [mailto:pins...@gmail.com] Sent: Thursday, November 24, 2011 12:15 PM To: Jiangning Liu Cc: gcc@gcc.gnu.org Subject: Re: A case exposing code sink issue On Wed, Nov 23, 2011 at 8:05 PM, Jiangning Liu jiangning@arm.com wrote: If this is the root cause, which optimization pass in GCC take the role to sink them out of loop? How should we get it fixed? lim1 handles the case just fine for me. lim1 is the first loop pass. After lim1 I get: bb 4: # i.1_34 = PHI i.6_18(9), k.0_9(7) D.2934_5 = pretmp.11_33; D.2936_7 = i.1_34 + D.2934_5; a_p.2_8 = a[D.2936_7]; a_p_lsm.13_37 = a_p.2_8; b_p.3_13 = b[D.2936_7]; b_p_lsm.14_38 = b_p.3_13; MEM[(int *)a][D.2936_7] = 7; MEM[(int *)b][D.2936_7] = 7; i.6_18 = k.0_9 + i.1_34; i_lsm.12_39 = i.6_18; if (i.6_18 = 511) goto bb 9; else goto bb 8; bb 9: goto bb 4; a[D.2936_7] and b[D.2936_7] are not loop invariants, so it seems lim1 shouldn't be able to sink them, right? Do I misunderstand this optimization? Thanks, -Jiangning Thanks, Andrew Pinski
RE: A case exposing code sink issue
Sorry, I realize we can't do that optimization because a_p may have dependence upon other memory accesses like MEM[(int *)a][D.2248_7]. For example, if it happens a_p equals a_p, that optimization would be wrong. But can alias analysis solve the problem if we can guarantee (i+(1k)) is less than the upbound of array a's definition? Or is there any GCC command line switch assuming no array bound overflow? That way we can do more aggressive optimizations, right? Thanks, -Jiangning -Original Message- From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Jiangning Liu Sent: Thursday, November 24, 2011 12:05 PM To: gcc@gcc.gnu.org Subject: A case exposing code sink issue Hi, For this small test case, int a[512] ; int b[512] ; int *a_p ; int *b_p ; int i ; int k ; int f(void) { for( k = 1 ; k = 9 ; k++) { for( i = k ; i 512 ; i += k) { a_p = a[i + (1k)] ; b_p = b[i + (1k)] ; *a_p = 7 ; *b_p = 7 ; } } } Before sink pass we have, f () { int pretmp.11; int k.7; int i.6; int * b_p.3; int * a_p.2; int D.2248; int i.1; int D.2246; int k.0; bb 2: k = 1; bb 3: # k.0_9 = PHI k.7_20(11), 1(2) i = k.0_9; if (k.0_9 = 511) goto bb 7; else goto bb 8; bb 8: Invalid sum of incoming frequencies 900, should be 81 goto bb 5; bb 7: pretmp.11_19 = 1 k.0_9; bb 4: # i.1_34 = PHI i.6_18(9), k.0_9(7) D.2246_5 = pretmp.11_19; D.2248_7 = i.1_34 + D.2246_5; a_p.2_8 = a[D.2248_7]; a_p = a_p.2_8; b_p.3_13 = b[D.2248_7]; b_p = b_p.3_13; MEM[(int *)a][D.2248_7] = 7; MEM[(int *)b][D.2248_7] = 7; i.6_18 = k.0_9 + i.1_34; i = i.6_18; if (i.6_18 = 511) goto bb 9; else goto bb 8; bb 9: goto bb 4; bb 5: Invalid sum of incoming frequencies 81, should be 900 k.7_20 = k.0_9 + 1; k = k.7_20; if (k.7_20 = 9) goto bb 11; else goto bb 6; bb 11: goto bb 3; bb 6: return; } Can the following statements be sinked out of loop? I don't see this optimization happen in trunk. The consequence is register pressure increased and a spill/fill occurs in RA. a_p.2_8 = a[D.2248_7]; a_p = a_p.2_8; b_p.3_13 = b[D.2248_7]; b_p = b_p.3_13; I know the sink would happen in sink pass if a_p and b_p are local variables. If this is the root cause, which optimization pass in GCC take the role to sink them out of loop? How should we get it fixed? Thanks, -Jiangning
RE: A case exposing code sink issue
One more question... Can i = i.6_18; be sinked out of loop, because it doesn't have memory dependence with others? Thanks, -Jiangning -Original Message- From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Jiangning Liu Sent: Thursday, November 24, 2011 2:57 PM To: gcc@gcc.gnu.org Subject: RE: A case exposing code sink issue Sorry, I realize we can't do that optimization because a_p may have dependence upon other memory accesses like MEM[(int *)a][D.2248_7]. For example, if it happens a_p equals a_p, that optimization would be wrong. But can alias analysis solve the problem if we can guarantee (i+(1k)) is less than the upbound of array a's definition? Or is there any GCC command line switch assuming no array bound overflow? That way we can do more aggressive optimizations, right? Thanks, -Jiangning -Original Message- From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Jiangning Liu Sent: Thursday, November 24, 2011 12:05 PM To: gcc@gcc.gnu.org Subject: A case exposing code sink issue Hi, For this small test case, int a[512] ; int b[512] ; int *a_p ; int *b_p ; int i ; int k ; int f(void) { for( k = 1 ; k = 9 ; k++) { for( i = k ; i 512 ; i += k) { a_p = a[i + (1k)] ; b_p = b[i + (1k)] ; *a_p = 7 ; *b_p = 7 ; } } } Before sink pass we have, f () { int pretmp.11; int k.7; int i.6; int * b_p.3; int * a_p.2; int D.2248; int i.1; int D.2246; int k.0; bb 2: k = 1; bb 3: # k.0_9 = PHI k.7_20(11), 1(2) i = k.0_9; if (k.0_9 = 511) goto bb 7; else goto bb 8; bb 8: Invalid sum of incoming frequencies 900, should be 81 goto bb 5; bb 7: pretmp.11_19 = 1 k.0_9; bb 4: # i.1_34 = PHI i.6_18(9), k.0_9(7) D.2246_5 = pretmp.11_19; D.2248_7 = i.1_34 + D.2246_5; a_p.2_8 = a[D.2248_7]; a_p = a_p.2_8; b_p.3_13 = b[D.2248_7]; b_p = b_p.3_13; MEM[(int *)a][D.2248_7] = 7; MEM[(int *)b][D.2248_7] = 7; i.6_18 = k.0_9 + i.1_34; i = i.6_18; if (i.6_18 = 511) goto bb 9; else goto bb 8; bb 9: goto bb 4; bb 5: Invalid sum of incoming frequencies 81, should be 900 k.7_20 = k.0_9 + 1; k = k.7_20; if (k.7_20 = 9) goto bb 11; else goto bb 6; bb 11: goto bb 3; bb 6: return; } Can the following statements be sinked out of loop? I don't see this optimization happen in trunk. The consequence is register pressure increased and a spill/fill occurs in RA. a_p.2_8 = a[D.2248_7]; a_p = a_p.2_8; b_p.3_13 = b[D.2248_7]; b_p = b_p.3_13; I know the sink would happen in sink pass if a_p and b_p are local variables. If this is the root cause, which optimization pass in GCC take the role to sink them out of loop? How should we get it fixed? Thanks, -Jiangning