RE: A case exposing code sink issue

2012-01-05 Thread Jiangning Liu
   In final value replacement, expression a + D. can be
 figured
  out,
   while a[i_xxx] failed to be CHRECed, so I'm wondering if we
 should
   lower
   a[i_xxx] to a + unitsize(a) * i_xxx first? It seems GCC
 intends
  to
   keep
   a[i_xxx] until cfgexpand pass. 

Richard,

Thanks a lot for your explanation!

Any comments for this question as well? Does GCC intends to keep a[i] until
expand pass? Any special reason? If I change CHREC algorithm, I see ivopt
would have done this lowering, so we wouldn't see a[i] in expand pass. Any
hurt?

Thanks,
-Jiangning






Re: A case exposing code sink issue

2012-01-05 Thread Richard Guenther
On Thu, Jan 5, 2012 at 9:34 AM, Jiangning Liu jiangning@arm.com wrote:
   In final value replacement, expression a + D. can be
 figured
  out,
   while a[i_xxx] failed to be CHRECed, so I'm wondering if we
 should
   lower
   a[i_xxx] to a + unitsize(a) * i_xxx first? It seems GCC
 intends
  to
   keep
   a[i_xxx] until cfgexpand pass.

 Richard,

 Thanks a lot for your explanation!

 Any comments for this question as well? Does GCC intends to keep a[i] until
 expand pass?

It's kept as it suits - it's shorter and easier to analyze (we know the implicit
multiplication won't overflow)

 Any special reason? If I change CHREC algorithm, I see ivopt
 would have done this lowering, so we wouldn't see a[i] in expand pass. Any
 hurt?

No, I think changing the CHREC algorithm is ok (I didn't look closely at
your patch).  This is stage1 material though.

Thanks,
Richard.

 Thanks,
 -Jiangning






RE: A case exposing code sink issue

2012-01-05 Thread Jiangning Liu


 -Original Message-
 From: Richard Guenther [mailto:richard.guent...@gmail.com]
 Sent: Thursday, January 05, 2012 5:32 PM
 To: Jiangning Liu
 Cc: Michael Matz; gcc@gcc.gnu.org
 Subject: Re: A case exposing code sink issue
 
 On Thu, Jan 5, 2012 at 9:34 AM, Jiangning Liu jiangning@arm.com
 wrote:
In final value replacement, expression a + D. can be
  figured
   out,
while a[i_xxx] failed to be CHRECed, so I'm wondering if we
  should
lower
a[i_xxx] to a + unitsize(a) * i_xxx first? It seems GCC
  intends
   to
keep
a[i_xxx] until cfgexpand pass.
 
  Richard,
 
  Thanks a lot for your explanation!
 
  Any comments for this question as well? Does GCC intends to keep a[i]
 until
  expand pass?
 
 It's kept as it suits - it's shorter and easier to analyze (we know the
 implicit
 multiplication won't overflow)
 
  Any special reason? If I change CHREC algorithm, I see ivopt
  would have done this lowering, so we wouldn't see a[i] in expand
 pass. Any
  hurt?
 
 No, I think changing the CHREC algorithm is ok (I didn't look closely
 at
 your patch).  This is stage1 material though.

Actually I didn't send it out at all. :-) And I just send out a RFC here
http://gcc.gnu.org/ml/gcc-patches/2012-01/msg00236.html, can you have a
look?

 
 Thanks,
 Richard.
 
  Thanks,
  -Jiangning
 
 
 
 






Re: A case exposing code sink issue

2012-01-02 Thread Richard Guenther
On Thu, Dec 29, 2011 at 9:50 AM, Jiangning Liu jiangning@arm.com wrote:


 -Original Message-
 From: Jiangning Liu
 Sent: Wednesday, December 28, 2011 5:38 PM
 To: Jiangning Liu; 'Richard Guenther'
 Cc: Michael Matz; gcc@gcc.gnu.org
 Subject: RE: A case exposing code sink issue



  -Original Message-
  From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf
 Of
  Jiangning Liu
  Sent: Tuesday, December 27, 2011 5:10 PM
  To: 'Richard Guenther'
  Cc: Michael Matz; gcc@gcc.gnu.org
  Subject: RE: A case exposing code sink issue
 
  
   The job to do this is final value replacement, not sinking (we do
 not
   sink non-invariant expressions - you'd have to translate them
 through
   the loop-closed SSA exit PHI node, certainly doable, patches
   welcome ;)).
  
 
  Richard,
 
  In final value replacement, expression a + D. can be figured
 out,
  while a[i_xxx] failed to be CHRECed, so I'm wondering if we should
  lower
  a[i_xxx] to a + unitsize(a) * i_xxx first? It seems GCC intends
 to
  keep
  a[i_xxx] until cfgexpand pass. Or we have to directly modify CHREC
  algorithm to get it calculated?
 
  Appreciate your kindly help in advance!
 

 Richard,

 Now I have a patch working for the case of step i++, by directly
 modifying
 scalar evolution algorithm. the following code would be generated after
 SCCP,
 l
   # i_13 = PHI i_6(7), k_2(D)(4)
   a_p.0_4 = a[i_13];
   MEM[(int *)a][i_13] = 100;
   i_6 = i_13 + 1;
   if (i_6 = 999)
     goto bb 7;
   else
     goto bb 6;

 bb 6:
   a_p_lsm.5_11 = MEM[(void *)a + 3996B];
   a_p = a_p_lsm.5_11;
   goto bb 3;

 It looks good, but I still have problem when the case has step i+=k.

 For this case the value of variable i exiting loop isn't invariant, the
 algorithm below in scalar evolution doesn't work on it,

 compute_overall_effect_of_inner_loop()
 {
   ...
         tree nb_iter = number_of_latch_executions (inner_loop);

         if (nb_iter == chrec_dont_know)
           return chrec_dont_know;
         else
           {
             tree res;

             /* evolution_fn is the evolution function in LOOP.  Get
                its value in the nb_iter-th iteration.  */
             res = chrec_apply (inner_loop-num, evolution_fn, nb_iter);

             if (chrec_contains_symbols_defined_in_loop (res, loop-num))
               res = instantiate_parameters (loop, res);

             /* Continue the computation until ending on a parent of LOOP.
 */
             return compute_overall_effect_of_inner_loop (loop, res);
           }
 }

 In theory, we can still have the transformation like below even if the
 step
 is i+=k,

   # i_13 = PHI i_6(7), k_2(D)(4)
   i_14 = i_13,
   a_p.0_4 = a[i_13];
   MEM[(int *)a][i_13] = 100;
   i_6 = i_13 + k_2(D);     // i+=k
   if (i_6 = 999)
     goto bb 7;
   else
     goto bb 6;

 bb 6:
   a_p_lsm.5_11 = a[i_14];
   a_p = a_p_lsm.5_11;
   goto bb 3;

 But I realize this is not a loop closed SSA form at all, because i_14
 is
 being used out of the loop. Where could we extend the liverange of
 variable
 i in GCC infrastructure and finally solve this problem?


 It seems many people are still in the happy of the upcoming 2012 New Year.
 :-)

 Following my story, I find the following code in tree-ssa-copy.c

      /* Avoid copy propagation from an inner into an outer loop.
         Otherwise, this may move loop variant variables outside of
         their loops and prevent coalescing opportunities.  If the
         value was loop invariant, it will be hoisted by LICM and
         exposed for copy propagation.
         ???  The value will be always loop invariant.
         In loop-closed SSA form do not copy-propagate through
         PHI nodes in blocks with a loop exit edge predecessor.  */
      if (current_loops
           TREE_CODE (arg_value) == SSA_NAME
           (loop_depth_of_name (arg_value)  loop_depth_of_name (lhs)
              || (loops_state_satisfies_p (LOOP_CLOSED_SSA)
                   loop_exit_edge_p (e-src-loop_father, e
        {
          phi_val.value = lhs;
          break;
        }

 Here http://gcc.gnu.org/ml/gcc-patches/2006-12/msg00066.html, Dan said

 The original check was not because of coalescing, but because we would
 copy prop in-loop variables outside the loop, causing *more*
 invariantness in nested loops.

 Can anybody give me a concrete example to explain this statement?

I can neither make sense of the code nor of Dans comment.  But of
course the part about loop-closed-SSA is still true.  I _suppose_ that
the situation is like

 tem = stuff;
 for ()
   x = tem;
 foo = x;

and Dan expects LIM to hoist x = tem (instead of sinking it by copyprop).
Then we'd coalesce tem and x (but it would be still live across the loop),
instead of coalescing x and foo (keeping tem live over the loop).  LIM of
course does not hoist this kind of stuff because it appears too cheap.

 Anyway, for my simple case, I don't see bad thing would happen when
 propagate a[i] out

RE: A case exposing code sink issue

2011-12-29 Thread Jiangning Liu


 -Original Message-
 From: Jiangning Liu
 Sent: Wednesday, December 28, 2011 5:38 PM
 To: Jiangning Liu; 'Richard Guenther'
 Cc: Michael Matz; gcc@gcc.gnu.org
 Subject: RE: A case exposing code sink issue
 
 
 
  -Original Message-
  From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf
 Of
  Jiangning Liu
  Sent: Tuesday, December 27, 2011 5:10 PM
  To: 'Richard Guenther'
  Cc: Michael Matz; gcc@gcc.gnu.org
  Subject: RE: A case exposing code sink issue
 
  
   The job to do this is final value replacement, not sinking (we do
 not
   sink non-invariant expressions - you'd have to translate them
 through
   the loop-closed SSA exit PHI node, certainly doable, patches
   welcome ;)).
  
 
  Richard,
 
  In final value replacement, expression a + D. can be figured
 out,
  while a[i_xxx] failed to be CHRECed, so I'm wondering if we should
  lower
  a[i_xxx] to a + unitsize(a) * i_xxx first? It seems GCC intends
 to
  keep
  a[i_xxx] until cfgexpand pass. Or we have to directly modify CHREC
  algorithm to get it calculated?
 
  Appreciate your kindly help in advance!
 
 
 Richard,
 
 Now I have a patch working for the case of step i++, by directly
 modifying
 scalar evolution algorithm. the following code would be generated after
 SCCP,
 l
   # i_13 = PHI i_6(7), k_2(D)(4)
   a_p.0_4 = a[i_13];
   MEM[(int *)a][i_13] = 100;
   i_6 = i_13 + 1;
   if (i_6 = 999)
 goto bb 7;
   else
 goto bb 6;
 
 bb 6:
   a_p_lsm.5_11 = MEM[(void *)a + 3996B];
   a_p = a_p_lsm.5_11;
   goto bb 3;
 
 It looks good, but I still have problem when the case has step i+=k.
 
 For this case the value of variable i exiting loop isn't invariant, the
 algorithm below in scalar evolution doesn't work on it,
 
 compute_overall_effect_of_inner_loop()
 {
   ...
 tree nb_iter = number_of_latch_executions (inner_loop);
 
 if (nb_iter == chrec_dont_know)
   return chrec_dont_know;
 else
   {
 tree res;
 
 /* evolution_fn is the evolution function in LOOP.  Get
its value in the nb_iter-th iteration.  */
 res = chrec_apply (inner_loop-num, evolution_fn, nb_iter);
 
 if (chrec_contains_symbols_defined_in_loop (res, loop-num))
   res = instantiate_parameters (loop, res);
 
 /* Continue the computation until ending on a parent of LOOP.
 */
 return compute_overall_effect_of_inner_loop (loop, res);
   }
 }
 
 In theory, we can still have the transformation like below even if the
 step
 is i+=k,
 
   # i_13 = PHI i_6(7), k_2(D)(4)
   i_14 = i_13,
   a_p.0_4 = a[i_13];
   MEM[(int *)a][i_13] = 100;
   i_6 = i_13 + k_2(D); // i+=k
   if (i_6 = 999)
 goto bb 7;
   else
 goto bb 6;
 
 bb 6:
   a_p_lsm.5_11 = a[i_14];
   a_p = a_p_lsm.5_11;
   goto bb 3;
 
 But I realize this is not a loop closed SSA form at all, because i_14
 is
 being used out of the loop. Where could we extend the liverange of
 variable
 i in GCC infrastructure and finally solve this problem?
 

It seems many people are still in the happy of the upcoming 2012 New Year.
:-)

Following my story, I find the following code in tree-ssa-copy.c

  /* Avoid copy propagation from an inner into an outer loop.
 Otherwise, this may move loop variant variables outside of
 their loops and prevent coalescing opportunities.  If the
 value was loop invariant, it will be hoisted by LICM and
 exposed for copy propagation.
 ???  The value will be always loop invariant.
 In loop-closed SSA form do not copy-propagate through
 PHI nodes in blocks with a loop exit edge predecessor.  */
  if (current_loops
   TREE_CODE (arg_value) == SSA_NAME
   (loop_depth_of_name (arg_value)  loop_depth_of_name (lhs)
  || (loops_state_satisfies_p (LOOP_CLOSED_SSA)
   loop_exit_edge_p (e-src-loop_father, e
{
  phi_val.value = lhs;
  break;
}

Here http://gcc.gnu.org/ml/gcc-patches/2006-12/msg00066.html, Dan said 

The original check was not because of coalescing, but because we would
copy prop in-loop variables outside the loop, causing *more*
invariantness in nested loops.

Can anybody give me a concrete example to explain this statement?

Anyway, for my simple case, I don't see bad thing would happen when
propagate a[i] out of the loop. Also, after this propagation, a_p.0_4 =
a[i_13]; within the loop would be dead code and removed in the passes
afterwards. In my opinion, that way the computation would be reduced in the
loop. Did I make any mistake?

  Thanks,
  -Jiangning
 
 
 
 
 






RE: A case exposing code sink issue

2011-12-28 Thread Jiangning Liu


 -Original Message-
 From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of
 Jiangning Liu
 Sent: Tuesday, December 27, 2011 5:10 PM
 To: 'Richard Guenther'
 Cc: Michael Matz; gcc@gcc.gnu.org
 Subject: RE: A case exposing code sink issue
 
 
  The job to do this is final value replacement, not sinking (we do not
  sink non-invariant expressions - you'd have to translate them through
  the loop-closed SSA exit PHI node, certainly doable, patches
  welcome ;)).
 
 
 Richard,
 
 In final value replacement, expression a + D. can be figured out,
 while a[i_xxx] failed to be CHRECed, so I'm wondering if we should
 lower
 a[i_xxx] to a + unitsize(a) * i_xxx first? It seems GCC intends to
 keep
 a[i_xxx] until cfgexpand pass. Or we have to directly modify CHREC
 algorithm to get it calculated?
 
 Appreciate your kindly help in advance!
 

Richard,

Now I have a patch working for the case of step i++, by directly modifying
scalar evolution algorithm. the following code would be generated after
SCCP,
l
  # i_13 = PHI i_6(7), k_2(D)(4)
  a_p.0_4 = a[i_13];
  MEM[(int *)a][i_13] = 100;
  i_6 = i_13 + 1;
  if (i_6 = 999)
goto bb 7;
  else
goto bb 6;

bb 6:
  a_p_lsm.5_11 = MEM[(void *)a + 3996B];
  a_p = a_p_lsm.5_11;
  goto bb 3;

It looks good, but I still have problem when the case has step i+=k. 

For this case the value of variable i exiting loop isn't invariant, the
algorithm below in scalar evolution doesn't work on it,

compute_overall_effect_of_inner_loop()
{
  ...
  tree nb_iter = number_of_latch_executions (inner_loop);

  if (nb_iter == chrec_dont_know)
return chrec_dont_know;
  else
{
  tree res;

  /* evolution_fn is the evolution function in LOOP.  Get
 its value in the nb_iter-th iteration.  */
  res = chrec_apply (inner_loop-num, evolution_fn, nb_iter);

  if (chrec_contains_symbols_defined_in_loop (res, loop-num))
res = instantiate_parameters (loop, res);

  /* Continue the computation until ending on a parent of LOOP.
*/
  return compute_overall_effect_of_inner_loop (loop, res);
}
}

In theory, we can still have the transformation like below even if the step
is i+=k,

  # i_13 = PHI i_6(7), k_2(D)(4)
  i_14 = i_13,
  a_p.0_4 = a[i_13];
  MEM[(int *)a][i_13] = 100;
  i_6 = i_13 + k_2(D); // i+=k
  if (i_6 = 999)
goto bb 7;
  else
goto bb 6;

bb 6:
  a_p_lsm.5_11 = a[i_14];
  a_p = a_p_lsm.5_11;
  goto bb 3;

But I realize this is not a loop closed SSA form at all, because i_14 is
being used out of the loop. Where could we extend the liverange of variable
i in GCC infrastructure and finally solve this problem?

 Thanks,
 -Jiangning
 
 
 






RE: A case exposing code sink issue

2011-12-27 Thread Jiangning Liu
 
 The job to do this is final value replacement, not sinking (we do not
 sink non-invariant expressions - you'd have to translate them through
 the loop-closed SSA exit PHI node, certainly doable, patches
 welcome ;)).
 

Richard,

In final value replacement, expression a + D. can be figured out,
while a[i_xxx] failed to be CHRECed, so I'm wondering if we should lower
a[i_xxx] to a + unitsize(a) * i_xxx first? It seems GCC intends to keep
a[i_xxx] until cfgexpand pass. Or we have to directly modify CHREC
algorithm to get it calculated?

Appreciate your kindly help in advance!

Thanks,
-Jiangning





RE: A case exposing code sink issue

2011-12-22 Thread Jiangning Liu
 Yes, the number of iterations of the i loop simply is too difficult for
 our loop iteration calculator to comprehend:
 
   for (i=k; i500; i+=k)
 
 iterates for roundup((500-k)/k) time.  In particular if the step is
 non-constant our nr-of-iteration calculator gives up.
 

I'm trying to give an even smaller case,

int a[512] ;
int *a_p ;

int f(int k)
{
int i ;

for(i=0; ik; i++)
{
a_p = a[i] ;
*a_p = 7 ;
}
}

For this case, we have a very simple loop step i++, then we would have the
GIMPLE before expand like below,

bb 5:
  # i_13 = PHI i_6(5), 0(4)
  # ivtmp.10_9 = PHI ivtmp.10_1(5), ivtmp.10_15(4)
  a_p_lsm.6_4 = a[i_13];
  ivtmp.10_1 = ivtmp.10_9 + 4;
  D.4085_16 = (void *) ivtmp.10_1;
  MEM[base: D.4085_16, offset: 0B] = 7;
  i_6 = i_13 + 1;
  if (i_6 != k_3(D))
goto bb 5;
  else
goto bb 6;

bb 6:
  # a_p_lsm.6_11 = PHI a_p_lsm.6_4(5)
  a_p = a_p_lsm.6_11;
  goto bb 3;

Why can't we still sunk a[i_13] out of loop? For example, I expect to
generate the code like below,

bb 5:
  # i_13 = PHI i_6(5), 0(4)
  # ivtmp.10_9 = PHI ivtmp.10_1(5), ivtmp.10_15(4)
  i_14 = i_13;
  ivtmp.10_1 = ivtmp.10_9 + 4;
  D.4085_16 = (void *) ivtmp.10_1;
  MEM[base: D.4085_16, offset: 0B] = 7;
  i_6 = i_13 + 1;
  if (i_6 != k_3(D))
goto bb 5;
  else
goto bb 6;

bb 6:
  # a_p_lsm.6_11 = PHI a_p_lsm.6_4(5)
  a_p_lsm.6_4 = a[i_14];
  a_p = a_p_lsm.6_11;
  goto bb 3;

This way the computation of a[i] would be saved within the loop. 

Any idea?

Thanks,
-Jiangning





Re: A case exposing code sink issue

2011-12-22 Thread Richard Guenther
On Thu, Dec 22, 2011 at 9:25 AM, Jiangning Liu jiangning@arm.com wrote:
 Yes, the number of iterations of the i loop simply is too difficult fo
 our loop iteration calculator to comprehend:

   for (i=k; i500; i+=k)

 iterates for roundup((500-k)/k) time.  In particular if the step is
 non-constant our nr-of-iteration calculator gives up.


 I'm trying to give an even smaller case,

 int a[512] ;
 int *a_p ;

 int f(int k)
 {
        int i ;

        for(i=0; ik; i++)
        {
                a_p = a[i] ;
                *a_p = 7 ;
        }
 }

 For this case, we have a very simple loop step i++, then we would have the
 GIMPLE before expand like below,

 bb 5:
  # i_13 = PHI i_6(5), 0(4)
  # ivtmp.10_9 = PHI ivtmp.10_1(5), ivtmp.10_15(4)
  a_p_lsm.6_4 = a[i_13];
  ivtmp.10_1 = ivtmp.10_9 + 4;
  D.4085_16 = (void *) ivtmp.10_1;
  MEM[base: D.4085_16, offset: 0B] = 7;
  i_6 = i_13 + 1;
  if (i_6 != k_3(D))
    goto bb 5;
  else
    goto bb 6;

 bb 6:
  # a_p_lsm.6_11 = PHI a_p_lsm.6_4(5)
  a_p = a_p_lsm.6_11;
  goto bb 3;

 Why can't we still sunk a[i_13] out of loop? For example, I expect to
 generate the code like below,

 bb 5:
  # i_13 = PHI i_6(5), 0(4)
  # ivtmp.10_9 = PHI ivtmp.10_1(5), ivtmp.10_15(4)
  i_14 = i_13;
  ivtmp.10_1 = ivtmp.10_9 + 4;
  D.4085_16 = (void *) ivtmp.10_1;
  MEM[base: D.4085_16, offset: 0B] = 7;
  i_6 = i_13 + 1;
  if (i_6 != k_3(D))
    goto bb 5;
  else
    goto bb 6;

 bb 6:
  # a_p_lsm.6_11 = PHI a_p_lsm.6_4(5)
  a_p_lsm.6_4 = a[i_14];
  a_p = a_p_lsm.6_11;
  goto bb 3;

 This way the computation of a[i] would be saved within the loop.

 Any idea?

The job to do this is final value replacement, not sinking (we do not
sink non-invariant expressions - you'd have to translate them through
the loop-closed SSA exit PHI node, certainly doable, patches welcome ;)).

Richard.

 Thanks,
 -Jiangning





RE: A case exposing code sink issue

2011-12-01 Thread Jiangning Liu


 -Original Message-
 From: Michael Matz [mailto:m...@suse.de]
 Sent: Monday, November 28, 2011 9:07 PM
 To: Jiangning Liu
 Cc: gcc@gcc.gnu.org
 Subject: RE: A case exposing code sink issue
 
 Hi,
 
 On Mon, 28 Nov 2011, Jiangning Liu wrote:
 
One more question...
   
Can  i = i.6_18; be sinked out of loop, because it doesn't have
   memory
dependence with others?
  
   With current trunk the stores to i, a_p, b_p and k are sunken after
 the
   loop.  (There are no aliasing problems because the decls can't
   conflict).
  
   What isn't sunken is the calculation of the a[D.2248_7] expression.
   First, the number of iterations of the inner loop can't be
 determined
   by
   current code (replacing i+=k with e.g. i++ could be handled for
   instance).
 
  Hi Michael,
 
  Do you know what the essential problem is in the case of loop
 iteration
  uncertainty?
 
 Yes, the number of iterations of the i loop simply is too difficult for
 our loop iteration calculator to comprehend:
 
   for (i=k; i500; i+=k)
 
 iterates for roundup((500-k)/k) time.  In particular if the step is
 non-constant our nr-of-iteration calculator gives up.

So do you think this can be improved somewhere?

For this case, looking at the result in middle end, a_p.2_8 =
a[D.2248_7]; should be able to sunken out of loop. That way the
computation of a[D.2248_7] would be saved in loop, although the consequence
is the liverange of D.2248_7 is longer and it needs to live out of loop. But
anyway the register pressure would be decreased within the loop, and we
would less possibly have spill/fill code. This is what I want.

I think we can simply use loop induction variable analysis to solve this
problem. Do you think so?

Thanks,
-Jiangning

 
  I thought it was still an aliasing problem.
 
 No.  All accesses are resolved to final objects (i.e. no pointers), and
 hence can be trivially disambiguated.
 
 
 Ciao,
 Michael.






RE: A case exposing code sink issue

2011-11-28 Thread Michael Matz
Hi,

On Mon, 28 Nov 2011, Jiangning Liu wrote:

   One more question...
  
   Can  i = i.6_18; be sinked out of loop, because it doesn't have
  memory
   dependence with others?
  
  With current trunk the stores to i, a_p, b_p and k are sunken after the
  loop.  (There are no aliasing problems because the decls can't
  conflict).
  
  What isn't sunken is the calculation of the a[D.2248_7] expression.
  First, the number of iterations of the inner loop can't be determined
  by
  current code (replacing i+=k with e.g. i++ could be handled for
  instance).
 
 Hi Michael,
 
 Do you know what the essential problem is in the case of loop iteration
 uncertainty?

Yes, the number of iterations of the i loop simply is too difficult for 
our loop iteration calculator to comprehend:

  for (i=k; i500; i+=k)

iterates for roundup((500-k)/k) time.  In particular if the step is 
non-constant our nr-of-iteration calculator gives up.

 I thought it was still an aliasing problem.

No.  All accesses are resolved to final objects (i.e. no pointers), and 
hence can be trivially disambiguated.


Ciao,
Michael.


RE: A case exposing code sink issue

2011-11-27 Thread Jiangning Liu


 -Original Message-
 From: Michael Matz [mailto:m...@suse.de]
 Sent: Friday, November 25, 2011 11:23 PM
 To: Jiangning Liu
 Cc: gcc@gcc.gnu.org
 Subject: RE: A case exposing code sink issue
 
 Hi,
 
 On Thu, 24 Nov 2011, Jiangning Liu wrote:
 
  One more question...
 
  Can  i = i.6_18; be sinked out of loop, because it doesn't have
 memory
  dependence with others?
 
 With current trunk the stores to i, a_p, b_p and k are sunken after the
 loop.  (There are no aliasing problems because the decls can't
 conflict).
 
 What isn't sunken is the calculation of the a[D.2248_7] expression.
 First, the number of iterations of the inner loop can't be determined
 by
 current code (replacing i+=k with e.g. i++ could be handled for
 instance).

Hi Michael,

Do you know what the essential problem is in the case of loop iteration
uncertainty? I thought it was still an aliasing problem.

Thanks,
-Jiangning

 Then this code could be handled by final value replacement, but isn't
 because interpret_rhs_expr doesn't deal with ADDR_EXPR of ARRAY_REFs.
 
 
 Ciao,
 Michael.






RE: A case exposing code sink issue

2011-11-25 Thread Michael Matz
Hi,

On Thu, 24 Nov 2011, Jiangning Liu wrote:

 One more question...
 
 Can  i = i.6_18; be sinked out of loop, because it doesn't have memory
 dependence with others?

With current trunk the stores to i, a_p, b_p and k are sunken after the 
loop.  (There are no aliasing problems because the decls can't conflict).

What isn't sunken is the calculation of the a[D.2248_7] expression.  
First, the number of iterations of the inner loop can't be determined by 
current code (replacing i+=k with e.g. i++ could be handled for instance).
Then this code could be handled by final value replacement, but isn't 
because interpret_rhs_expr doesn't deal with ADDR_EXPR of ARRAY_REFs.


Ciao,
Michael.


A case exposing code sink issue

2011-11-23 Thread Jiangning Liu
Hi,

For this small test case,

int a[512] ;
int b[512] ;
int *a_p ;
int *b_p ;
int i ;
int k ;

int f(void)
{
for( k = 1 ; k = 9 ; k++)
{
for( i = k ; i  512 ; i += k)
{
a_p = a[i + (1k)] ;
b_p = b[i + (1k)] ;
*a_p = 7 ;
*b_p = 7 ;
}
}
}

Before sink pass we have,

f ()
{
  int pretmp.11;
  int k.7;
  int i.6;
  int * b_p.3;
  int * a_p.2;
  int D.2248;
  int i.1;
  int D.2246;
  int k.0;

bb 2:
  k = 1;

bb 3:
  # k.0_9 = PHI k.7_20(11), 1(2)
  i = k.0_9;
  if (k.0_9 = 511)
goto bb 7;
  else
goto bb 8;

bb 8:
Invalid sum of incoming frequencies 900, should be 81
  goto bb 5;

bb 7:
  pretmp.11_19 = 1  k.0_9;

bb 4:
  # i.1_34 = PHI i.6_18(9), k.0_9(7)
  D.2246_5 = pretmp.11_19;
  D.2248_7 = i.1_34 + D.2246_5;
  a_p.2_8 = a[D.2248_7];
  a_p = a_p.2_8;
  b_p.3_13 = b[D.2248_7];
  b_p = b_p.3_13;
  MEM[(int *)a][D.2248_7] = 7;
  MEM[(int *)b][D.2248_7] = 7;
  i.6_18 = k.0_9 + i.1_34;
  i = i.6_18;
  if (i.6_18 = 511)
goto bb 9;
  else
goto bb 8;

bb 9:
  goto bb 4;

bb 5:
Invalid sum of incoming frequencies 81, should be 900
  k.7_20 = k.0_9 + 1;
  k = k.7_20;
  if (k.7_20 = 9)
goto bb 11;
  else
goto bb 6;

bb 11:
  goto bb 3;

bb 6:
  return;

}

Can the following statements be sinked out of loop? I don't see this
optimization happen in trunk. The consequence is register pressure increased
and a spill/fill occurs in RA.

  a_p.2_8 = a[D.2248_7];
  a_p = a_p.2_8;
  b_p.3_13 = b[D.2248_7];
  b_p = b_p.3_13;

I know the sink would happen in sink pass if a_p and b_p are local
variables. 

If this is the root cause, which optimization pass in GCC take the role to
sink them out of loop? How should we get it fixed?

Thanks,
-Jiangning





Re: A case exposing code sink issue

2011-11-23 Thread Andrew Pinski
On Wed, Nov 23, 2011 at 8:05 PM, Jiangning Liu jiangning@arm.com wrote:
 If this is the root cause, which optimization pass in GCC take the role to
 sink them out of loop? How should we get it fixed?

lim1 handles the case just fine for me.  lim1 is the first loop pass.

After lim1 I get:

bb 4:
  # i.1_34 = PHI i.6_18(9), k.0_9(7)
  D.2934_5 = pretmp.11_33;
  D.2936_7 = i.1_34 + D.2934_5;
  a_p.2_8 = a[D.2936_7];
  a_p_lsm.13_37 = a_p.2_8;
  b_p.3_13 = b[D.2936_7];
  b_p_lsm.14_38 = b_p.3_13;
  MEM[(int *)a][D.2936_7] = 7;
  MEM[(int *)b][D.2936_7] = 7;
  i.6_18 = k.0_9 + i.1_34;
  i_lsm.12_39 = i.6_18;
  if (i.6_18 = 511)
goto bb 9;
  else
goto bb 8;

bb 9:
  goto bb 4;




Thanks,
Andrew Pinski


RE: A case exposing code sink issue

2011-11-23 Thread Jiangning Liu


 -Original Message-
 From: Andrew Pinski [mailto:pins...@gmail.com]
 Sent: Thursday, November 24, 2011 12:15 PM
 To: Jiangning Liu
 Cc: gcc@gcc.gnu.org
 Subject: Re: A case exposing code sink issue
 
 On Wed, Nov 23, 2011 at 8:05 PM, Jiangning Liu jiangning@arm.com
 wrote:
  If this is the root cause, which optimization pass in GCC take the
 role to
  sink them out of loop? How should we get it fixed?
 
 lim1 handles the case just fine for me.  lim1 is the first loop pass.
 
 After lim1 I get:
 
 bb 4:
   # i.1_34 = PHI i.6_18(9), k.0_9(7)
   D.2934_5 = pretmp.11_33;
   D.2936_7 = i.1_34 + D.2934_5;
   a_p.2_8 = a[D.2936_7];
   a_p_lsm.13_37 = a_p.2_8;
   b_p.3_13 = b[D.2936_7];
   b_p_lsm.14_38 = b_p.3_13;
   MEM[(int *)a][D.2936_7] = 7;
   MEM[(int *)b][D.2936_7] = 7;
   i.6_18 = k.0_9 + i.1_34;
   i_lsm.12_39 = i.6_18;
   if (i.6_18 = 511)
 goto bb 9;
   else
 goto bb 8;
 
 bb 9:
   goto bb 4;
 

a[D.2936_7] and b[D.2936_7] are not loop invariants, so it seems lim1 
shouldn't be able to sink them, right? Do I misunderstand this optimization?

Thanks,
-Jiangning

 
 
 
 Thanks,
 Andrew Pinski






RE: A case exposing code sink issue

2011-11-23 Thread Jiangning Liu
Sorry, I realize we can't do that optimization because a_p may have
dependence upon other memory accesses like MEM[(int *)a][D.2248_7].

For example, if it happens a_p equals a_p, that optimization would be
wrong.

But can alias analysis solve the problem if we can guarantee (i+(1k)) is
less than the upbound of array a's definition? 

Or is there any GCC command line switch assuming no array bound overflow?
That way we can do more aggressive optimizations, right?

Thanks,
-Jiangning

 -Original Message-
 From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of
 Jiangning Liu
 Sent: Thursday, November 24, 2011 12:05 PM
 To: gcc@gcc.gnu.org
 Subject: A case exposing code sink issue
 
 Hi,
 
 For this small test case,
 
 int a[512] ;
 int b[512] ;
 int *a_p ;
 int *b_p ;
 int i ;
 int k ;
 
 int f(void)
 {
 for( k = 1 ; k = 9 ; k++)
 {
 for( i = k ; i  512 ; i += k)
 {
 a_p = a[i + (1k)] ;
 b_p = b[i + (1k)] ;
 *a_p = 7 ;
 *b_p = 7 ;
 }
 }
 }
 
 Before sink pass we have,
 
 f ()
 {
   int pretmp.11;
   int k.7;
   int i.6;
   int * b_p.3;
   int * a_p.2;
   int D.2248;
   int i.1;
   int D.2246;
   int k.0;
 
 bb 2:
   k = 1;
 
 bb 3:
   # k.0_9 = PHI k.7_20(11), 1(2)
   i = k.0_9;
   if (k.0_9 = 511)
 goto bb 7;
   else
 goto bb 8;
 
 bb 8:
 Invalid sum of incoming frequencies 900, should be 81
   goto bb 5;
 
 bb 7:
   pretmp.11_19 = 1  k.0_9;
 
 bb 4:
   # i.1_34 = PHI i.6_18(9), k.0_9(7)
   D.2246_5 = pretmp.11_19;
   D.2248_7 = i.1_34 + D.2246_5;
   a_p.2_8 = a[D.2248_7];
   a_p = a_p.2_8;
   b_p.3_13 = b[D.2248_7];
   b_p = b_p.3_13;
   MEM[(int *)a][D.2248_7] = 7;
   MEM[(int *)b][D.2248_7] = 7;
   i.6_18 = k.0_9 + i.1_34;
   i = i.6_18;
   if (i.6_18 = 511)
 goto bb 9;
   else
 goto bb 8;
 
 bb 9:
   goto bb 4;
 
 bb 5:
 Invalid sum of incoming frequencies 81, should be 900
   k.7_20 = k.0_9 + 1;
   k = k.7_20;
   if (k.7_20 = 9)
 goto bb 11;
   else
 goto bb 6;
 
 bb 11:
   goto bb 3;
 
 bb 6:
   return;
 
 }
 
 Can the following statements be sinked out of loop? I don't see this
 optimization happen in trunk. The consequence is register pressure
 increased
 and a spill/fill occurs in RA.
 
   a_p.2_8 = a[D.2248_7];
   a_p = a_p.2_8;
   b_p.3_13 = b[D.2248_7];
   b_p = b_p.3_13;
 
 I know the sink would happen in sink pass if a_p and b_p are local
 variables.
 
 If this is the root cause, which optimization pass in GCC take the role
 to
 sink them out of loop? How should we get it fixed?
 
 Thanks,
 -Jiangning
 
 
 






RE: A case exposing code sink issue

2011-11-23 Thread Jiangning Liu
One more question...

Can  i = i.6_18; be sinked out of loop, because it doesn't have memory
dependence with others?

Thanks,
-Jiangning

 -Original Message-
 From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of
 Jiangning Liu
 Sent: Thursday, November 24, 2011 2:57 PM
 To: gcc@gcc.gnu.org
 Subject: RE: A case exposing code sink issue
 
 Sorry, I realize we can't do that optimization because a_p may have
 dependence upon other memory accesses like MEM[(int *)a][D.2248_7].
 
 For example, if it happens a_p equals a_p, that optimization would be
 wrong.
 
 But can alias analysis solve the problem if we can guarantee (i+(1k))
 is
 less than the upbound of array a's definition?
 
 Or is there any GCC command line switch assuming no array bound
 overflow?
 That way we can do more aggressive optimizations, right?
 
 Thanks,
 -Jiangning
 
  -Original Message-
  From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf
 Of
  Jiangning Liu
  Sent: Thursday, November 24, 2011 12:05 PM
  To: gcc@gcc.gnu.org
  Subject: A case exposing code sink issue
 
  Hi,
 
  For this small test case,
 
  int a[512] ;
  int b[512] ;
  int *a_p ;
  int *b_p ;
  int i ;
  int k ;
 
  int f(void)
  {
  for( k = 1 ; k = 9 ; k++)
  {
  for( i = k ; i  512 ; i += k)
  {
  a_p = a[i + (1k)] ;
  b_p = b[i + (1k)] ;
  *a_p = 7 ;
  *b_p = 7 ;
  }
  }
  }
 
  Before sink pass we have,
 
  f ()
  {
int pretmp.11;
int k.7;
int i.6;
int * b_p.3;
int * a_p.2;
int D.2248;
int i.1;
int D.2246;
int k.0;
 
  bb 2:
k = 1;
 
  bb 3:
# k.0_9 = PHI k.7_20(11), 1(2)
i = k.0_9;
if (k.0_9 = 511)
  goto bb 7;
else
  goto bb 8;
 
  bb 8:
  Invalid sum of incoming frequencies 900, should be 81
goto bb 5;
 
  bb 7:
pretmp.11_19 = 1  k.0_9;
 
  bb 4:
# i.1_34 = PHI i.6_18(9), k.0_9(7)
D.2246_5 = pretmp.11_19;
D.2248_7 = i.1_34 + D.2246_5;
a_p.2_8 = a[D.2248_7];
a_p = a_p.2_8;
b_p.3_13 = b[D.2248_7];
b_p = b_p.3_13;
MEM[(int *)a][D.2248_7] = 7;
MEM[(int *)b][D.2248_7] = 7;
i.6_18 = k.0_9 + i.1_34;
i = i.6_18;
if (i.6_18 = 511)
  goto bb 9;
else
  goto bb 8;
 
  bb 9:
goto bb 4;
 
  bb 5:
  Invalid sum of incoming frequencies 81, should be 900
k.7_20 = k.0_9 + 1;
k = k.7_20;
if (k.7_20 = 9)
  goto bb 11;
else
  goto bb 6;
 
  bb 11:
goto bb 3;
 
  bb 6:
return;
 
  }
 
  Can the following statements be sinked out of loop? I don't see this
  optimization happen in trunk. The consequence is register pressure
  increased
  and a spill/fill occurs in RA.
 
a_p.2_8 = a[D.2248_7];
a_p = a_p.2_8;
b_p.3_13 = b[D.2248_7];
b_p = b_p.3_13;
 
  I know the sink would happen in sink pass if a_p and b_p are local
  variables.
 
  If this is the root cause, which optimization pass in GCC take the
 role
  to
  sink them out of loop? How should we get it fixed?
 
  Thanks,
  -Jiangning