Re: conditional lim

2015-07-15 Thread Evgeniya Maenkova
'
 and
 'That is, if you move a conditionally executed statement but not any
 PHI node it feeds then the movement is useless.'
 I think about this example and don't understand. Could you clarify?

 Ok, so this case is already handled by LIM by simply moving tmp_inv1 = m*n
 (and making it unconditionally execute before the loop).  In fact whether
 we can move it does not depend on whether 'cond' is invariant or not.

 If we couldn't execute m*n unconditionally on the loop preheader edge
 then we'd have to be able to also move the guarding condition.  Note
 that cost considerations shouldn't come into play here.

 So I was only thinking of the case where we also want to move the
 condition (which is wanted if we want to move a PHI node).


 Don't understand So I was only thinking of the case where we also
 want to move the condition (which is wanted if we want to move a PHI
 node).

 What about the case when we want to move condition but don't have phi
 node to move.

 Let's consider a little bit modified example:

 void bar (long);
 void foo (int cond1, int cond2, int cond3, int m, int n)
   {
  unsigned x;
  unsigned inv1;

  for (x = 0; x  1000; ++x)
{
  if (n)
{
  int tmp_inv1 = m/n;
  non_inv1 = tmp_inv1 + x;
 }
}
  bar (non_inv1);
  }
 We couldn't  move tmp_inv1 uncoditionally. Right (I have no clean
 build at hands right now to check. If I'm wrong let's replace m/n to
 something that couldn't be unconditionally moved :))? There is no phi
 node for tmp_inv1.
 Could you clarify what conditional lim should do in your opinion here?

 Nothing at this point.  If only then for the simple reason to make the
 first iteration of the patch simpler (still covering most cases).

 You seem to want to catch 100% of all possible cases.

 Is it too little gain to move statements that only used in the same bb
 or branch (I mean before phi node creation, even if stmt had phi
 node)?

 In fact, we could have here 2 thousands of divisions:
 if (n)
   {
 int tmp_inv1 = m/n;
 tmp_inv2 = something_which_couldn't_be moved_unconditionally (tmp_inv1);
 ...
 tmp_invN = something_which_couldn't_be moved_unconditionally 
 (tmp_invN-1);
 non_inv1 = tmp_invN + x;
   }

 Of course we could.  But the patch didn't even exercise this case (no 
 testcase).

Patch handles this case (this is why I created phi nodes which we
tried to discuss).
Your comment is about test case not about the patch.


 Please make patches simple - support for moving non-PHIs can be added
 as a followup.  The first important part is to make the current code
 (moving PHIs)
 create control-flow rather than if-converted form.

 Richard.



 3.3   replace_uses_in_conditions

 After computations movements we can have several copies of the cond
 stmt. In 3.1 we do replacement of uses based on stmt’s tgt_level. For,
 cond stmt it doesn’t work. As cond stmt, of course, have something in
 lim_aux_data-tgt_level, but, in fact, they are moved only if
 corresponding invariants are moved. So, in fact, the same cond (copies
 of it, of course) could go to the different levels. So to fix these
 uses, we call replace_uses_in_conditions.

 Hmm, but stmts that a condition depends on always have at least the
 target level of the condition statement.  Thus their computation is
 available in all other same conditon statements that might appear
 in more inner loop levels, no?  So isn't this just about performing the
 movement in a proper ordering?

 Let me give your more details, then may be you repeat question in other 
 words.
 for ()
 {
   if (a)
 {
inv1 = some_computations;
if (inv1  inv2)
 {
   inv3;
   inv4;
 }
}
 }

 So, the order for movement is: inv1; inv3; inv4. When we move inv1 we
 have tgt_level for inv3 and inv4 computed but we have not created
 copies of 'if (inv1  inv2)' created. In theory, we can have several
 copies:(1) for tgt_level of inv3 (2) for tgt_level of inv2 (3) in
 initial cycle. if (1) is in tgt level of inv1 we need no replacement,
 in other cases we need replacements. Of course, we know this(all the
 info about tgt_levels) when we move inv1, but then we need to create
 all needed copies (store it somehow, etc) of 'if (inv1  inv2') and
 make replacement in these copies. I don't see now why this is much
 better. This doesn't mean that current solution is perfect, just
 clarify your thought.

 I still don't get what replacements actually mean.  Let's extend the
 example to

 for ()  (A)
   {
 for ()  (B)
 {
   if (a)
 {
inv1 = some_computations;
if (inv1  inv2)
 {
   inv3;
   inv4;
 }
}
 }
  }

 then you say the target level of inv1 could be 'A' and the target
 level of inv3 'B'
 and that of inv4 is 'B' as well.

 That's true.  So we move inv1 to before 'A' and inv3/inv4 to before 'B' 
 guarded
 with if (inv1  inv2).  I

Re: conditional lim

2015-07-15 Thread Richard Biener
. Or something else.

 So, when I read
 'Only PHI nodes need PHI node creation'
 and
 'That is, if you move a conditionally executed statement but not any
 PHI node it feeds then the movement is useless.'
 I think about this example and don't understand. Could you clarify?

 Ok, so this case is already handled by LIM by simply moving tmp_inv1 = m*n
 (and making it unconditionally execute before the loop).  In fact whether
 we can move it does not depend on whether 'cond' is invariant or not.

 If we couldn't execute m*n unconditionally on the loop preheader edge
 then we'd have to be able to also move the guarding condition.  Note
 that cost considerations shouldn't come into play here.

 So I was only thinking of the case where we also want to move the
 condition (which is wanted if we want to move a PHI node).


 Don't understand So I was only thinking of the case where we also
 want to move the condition (which is wanted if we want to move a PHI
 node).

 What about the case when we want to move condition but don't have phi
 node to move.

 Let's consider a little bit modified example:

 void bar (long);
 void foo (int cond1, int cond2, int cond3, int m, int n)
   {
  unsigned x;
  unsigned inv1;

  for (x = 0; x  1000; ++x)
{
  if (n)
{
  int tmp_inv1 = m/n;
  non_inv1 = tmp_inv1 + x;
 }
}
  bar (non_inv1);
  }
 We couldn't  move tmp_inv1 uncoditionally. Right (I have no clean
 build at hands right now to check. If I'm wrong let's replace m/n to
 something that couldn't be unconditionally moved :))? There is no phi
 node for tmp_inv1.
 Could you clarify what conditional lim should do in your opinion here?

 Nothing at this point.  If only then for the simple reason to make the
 first iteration of the patch simpler (still covering most cases).

 You seem to want to catch 100% of all possible cases.

 Is it too little gain to move statements that only used in the same bb
 or branch (I mean before phi node creation, even if stmt had phi
 node)?

 In fact, we could have here 2 thousands of divisions:
 if (n)
   {
 int tmp_inv1 = m/n;
 tmp_inv2 = something_which_couldn't_be moved_unconditionally (tmp_inv1);
 ...
 tmp_invN = something_which_couldn't_be moved_unconditionally 
 (tmp_invN-1);
 non_inv1 = tmp_invN + x;
   }

 Of course we could.  But the patch didn't even exercise this case (no 
 testcase).

 Patch handles this case (this is why I created phi nodes which we
 tried to discuss).
 Your comment is about test case not about the patch.

My comment was about the lack of a testcase for this in the patch submission.

But my whole point is that supporting this should be split off to a followup
patch to make both patches easier to review.

Richard.


 Please make patches simple - support for moving non-PHIs can be added
 as a followup.  The first important part is to make the current code
 (moving PHIs)
 create control-flow rather than if-converted form.

 Richard.



 3.3   replace_uses_in_conditions

 After computations movements we can have several copies of the cond
 stmt. In 3.1 we do replacement of uses based on stmt’s tgt_level. For,
 cond stmt it doesn’t work. As cond stmt, of course, have something in
 lim_aux_data-tgt_level, but, in fact, they are moved only if
 corresponding invariants are moved. So, in fact, the same cond (copies
 of it, of course) could go to the different levels. So to fix these
 uses, we call replace_uses_in_conditions.

 Hmm, but stmts that a condition depends on always have at least the
 target level of the condition statement.  Thus their computation is
 available in all other same conditon statements that might appear
 in more inner loop levels, no?  So isn't this just about performing the
 movement in a proper ordering?

 Let me give your more details, then may be you repeat question in other 
 words.
 for ()
 {
   if (a)
 {
inv1 = some_computations;
if (inv1  inv2)
 {
   inv3;
   inv4;
 }
}
 }

 So, the order for movement is: inv1; inv3; inv4. When we move inv1 we
 have tgt_level for inv3 and inv4 computed but we have not created
 copies of 'if (inv1  inv2)' created. In theory, we can have several
 copies:(1) for tgt_level of inv3 (2) for tgt_level of inv2 (3) in
 initial cycle. if (1) is in tgt level of inv1 we need no replacement,
 in other cases we need replacements. Of course, we know this(all the
 info about tgt_levels) when we move inv1, but then we need to create
 all needed copies (store it somehow, etc) of 'if (inv1  inv2') and
 make replacement in these copies. I don't see now why this is much
 better. This doesn't mean that current solution is perfect, just
 clarify your thought.

 I still don't get what replacements actually mean.  Let's extend the
 example to

 for ()  (A)
   {
 for ()  (B)
 {
   if (a)
 {
inv1 = some_computations;
if (inv1  inv2

Re: conditional lim

2015-07-14 Thread Richard Biener
 then the movement is useless.'
 I think about this example and don't understand. Could you clarify?

 Ok, so this case is already handled by LIM by simply moving tmp_inv1 = m*n
 (and making it unconditionally execute before the loop).  In fact whether
 we can move it does not depend on whether 'cond' is invariant or not.

 If we couldn't execute m*n unconditionally on the loop preheader edge
 then we'd have to be able to also move the guarding condition.  Note
 that cost considerations shouldn't come into play here.

 So I was only thinking of the case where we also want to move the
 condition (which is wanted if we want to move a PHI node).


 Don't understand So I was only thinking of the case where we also
 want to move the condition (which is wanted if we want to move a PHI
 node).

 What about the case when we want to move condition but don't have phi
 node to move.

 Let's consider a little bit modified example:

 void bar (long);
 void foo (int cond1, int cond2, int cond3, int m, int n)
   {
  unsigned x;
  unsigned inv1;

  for (x = 0; x  1000; ++x)
{
  if (n)
{
  int tmp_inv1 = m/n;
  non_inv1 = tmp_inv1 + x;
 }
}
  bar (non_inv1);
  }
 We couldn't  move tmp_inv1 uncoditionally. Right (I have no clean
 build at hands right now to check. If I'm wrong let's replace m/n to
 something that couldn't be unconditionally moved :))? There is no phi
 node for tmp_inv1.
 Could you clarify what conditional lim should do in your opinion here?

Nothing at this point.  If only then for the simple reason to make the
first iteration of the patch simpler (still covering most cases).

You seem to want to catch 100% of all possible cases.

 Is it too little gain to move statements that only used in the same bb
 or branch (I mean before phi node creation, even if stmt had phi
 node)?

 In fact, we could have here 2 thousands of divisions:
 if (n)
   {
 int tmp_inv1 = m/n;
 tmp_inv2 = something_which_couldn't_be moved_unconditionally (tmp_inv1);
 ...
 tmp_invN = something_which_couldn't_be moved_unconditionally (tmp_invN-1);
 non_inv1 = tmp_invN + x;
   }

Of course we could.  But the patch didn't even exercise this case (no testcase).

Please make patches simple - support for moving non-PHIs can be added
as a followup.  The first important part is to make the current code
(moving PHIs)
create control-flow rather than if-converted form.

Richard.



 3.3   replace_uses_in_conditions

 After computations movements we can have several copies of the cond
 stmt. In 3.1 we do replacement of uses based on stmt’s tgt_level. For,
 cond stmt it doesn’t work. As cond stmt, of course, have something in
 lim_aux_data-tgt_level, but, in fact, they are moved only if
 corresponding invariants are moved. So, in fact, the same cond (copies
 of it, of course) could go to the different levels. So to fix these
 uses, we call replace_uses_in_conditions.

 Hmm, but stmts that a condition depends on always have at least the
 target level of the condition statement.  Thus their computation is
 available in all other same conditon statements that might appear
 in more inner loop levels, no?  So isn't this just about performing the
 movement in a proper ordering?

 Let me give your more details, then may be you repeat question in other 
 words.
 for ()
 {
   if (a)
 {
inv1 = some_computations;
if (inv1  inv2)
 {
   inv3;
   inv4;
 }
}
 }

 So, the order for movement is: inv1; inv3; inv4. When we move inv1 we
 have tgt_level for inv3 and inv4 computed but we have not created
 copies of 'if (inv1  inv2)' created. In theory, we can have several
 copies:(1) for tgt_level of inv3 (2) for tgt_level of inv2 (3) in
 initial cycle. if (1) is in tgt level of inv1 we need no replacement,
 in other cases we need replacements. Of course, we know this(all the
 info about tgt_levels) when we move inv1, but then we need to create
 all needed copies (store it somehow, etc) of 'if (inv1  inv2') and
 make replacement in these copies. I don't see now why this is much
 better. This doesn't mean that current solution is perfect, just
 clarify your thought.

 I still don't get what replacements actually mean.  Let's extend the
 example to

 for ()  (A)
   {
 for ()  (B)
 {
   if (a)
 {
inv1 = some_computations;
if (inv1  inv2)
 {
   inv3;
   inv4;
 }
}
 }
  }

 then you say the target level of inv1 could be 'A' and the target
 level of inv3 'B'
 and that of inv4 is 'B' as well.

 That's true.  So we move inv1 to before 'A' and inv3/inv4 to before 'B' 
 guarded
 with if (inv1  inv2).  I fail to see what copies we need here.  'inv1' is
 available in all target levels below its own, no copies needed.


 I meant the copies of conditions (COND_STMT: inv1  inv2).

 Let's modify this example: tgt_level (inv1) = A; tgt_level(inv3

Re: conditional lim

2015-06-29 Thread Richard Biener
 to have a separate
 phase move_phis executed before move_computations, that just moves
 invariant PHI nodes (which this all is about - see above) and their
 controlling conditions and only then move their dependencies.
 I didn;t understand so far (I think let's clarify previous 
 questions,especially
 about phi nodes then let's go back to this one.)

Yeah, I think the PHI node thing is our major mis-understanding (or the
main thing that I think complicates the patch for too little gain - at least
initially).

Thanks,
Richard.


 Thanks,
 Richard.

 In short, that’s all.

 Thanks,

 Evgeniya



 On Sat, May 9, 2015 at 1:07 AM, Evgeniya Maenkova
 evgeniya.maenk...@gmail.com wrote:
 Hi,

 Could you please review my patch for predicated lim?

 Let me note some details about it:



 1)  Phi statements are still moved only if they have 1 or 2
 arguments. However, phi statements could be move under conditions (as
 it’s done for the other statements).  Probably, phi statement motion
 with 3 + arguments could be implemented in the next patch after
 predicated lim.

 2)  Patch has limitations/features like (it was ok to me to
 implement it such way, maybe I’m not correct. ):

 a)  Loop1

 {

   If (a)

  Loop2

  {

Stmt - Invariant for Loop1

  }

  }

In this case Stmt will be moved only out of Loop2, because of if 
 (a).

 b)  Or

 Loop1

 {

  …

  If (cond1)

   If (cond2)

   If (cond3)

   Stmt;

}

 Stmt will be moved out only if cond1 is always executed in Loop1.

 c)   It took me a long time to write all of these code, so there
 might be other peculiarities which I forgot to mention. :)

Let’s discuss these ones as you will review my 
 patch.

 3)  Patch consists of 9 files:

 a)  gcc/testsuite/gcc.dg/tree-ssa/loop-7.c,
 gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – changed tests:

 -  gcc/testsuite/gcc.dg/tree-ssa/loop-7.c  changed as
 predicated lim moves 2 more statements out of the loop;

 -  gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – with conditional
 lim recip optimization in this test doesn’t work (the corresponding
 value is below threshold as I could see in the code for recip, 13).
 So to have recip working in this test I changed test a little bit.

 b)  gcc/tree-ssa-loop-im.c – the patched lim per se

 c)   gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c,
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c,

 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c,
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c,

 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c,
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c

 the tests for conditional lim.

 4)  Patch testing:

 a)  make –k check (no difference in results for me for the clean
 build and the patched one,

 -  Revision: 222849,

 -  uname -a

Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue Oct
 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux

 b)  Bootstrap.

  It goes well now, however to fix it I have made a temporary hack in
 the lim code. And with this fix patch definitely shouldn’t be
 committed.

 I did so, as I would like to discuss this issue first.

 The problem is: I got stage2-stage3 comparison failure on the single
 file (tree-vect-data-refs.o). After some investigation I understood
 that tree-vect-data-refs.o differs being compiled with and without
 ‘-g’ option (yes, more exactly on stage 2 this is  ‘-g –O2 –gtoggle’,
 and for stage 3 this is ‘-g –O2’. But to simplify things I can
 reproduce this difference on the same build (even not bootstrapped),
 with option ‘ –g’ and without it).

 Of course, the file compiled with –g option will contain debug
 information and will differ from the corresponding file without debug
 information. I mean there is the difference reported by
 contrib/compare-debug (I mean we talk about stripped files).

 Then I compared assemblers and lim dump files and made a conclusion
 the difference is:

 There is statement _484=something, which is conditionally moved out of
 loop X. In non debug cases no usages of _484 are left inside the loop
 X. In debug case, there is the single use in debug statement. So for
 debug case my patch generates phi statement for _484 to make it
 available inside the loop X. For non debug case, no such phi statement
 generated as there is no uses of _484.

 There is one more statement with lhs=_493, with exactly this pattern
 of use. But this is not important for our discussion.



 So, in my opinion it’s ok to generate additional phi node for debug
 case. But I’m not a compiler expert and maybe there is some
 requirement that debug and non-debug versions should differ only by
 debug statements, I don’t know.



 My temporary hack fix is just removing of such debug statements (no
 debug statements, no problems J

Re: conditional lim

2015-06-29 Thread Evgeniya Maenkova
?

 Ok, so this case is already handled by LIM by simply moving tmp_inv1 = m*n
 (and making it unconditionally execute before the loop).  In fact whether
 we can move it does not depend on whether 'cond' is invariant or not.

 If we couldn't execute m*n unconditionally on the loop preheader edge
 then we'd have to be able to also move the guarding condition.  Note
 that cost considerations shouldn't come into play here.

 So I was only thinking of the case where we also want to move the
 condition (which is wanted if we want to move a PHI node).


Don't understand So I was only thinking of the case where we also
want to move the condition (which is wanted if we want to move a PHI
node).

What about the case when we want to move condition but don't have phi
node to move.

Let's consider a little bit modified example:

void bar (long);
void foo (int cond1, int cond2, int cond3, int m, int n)
  {
 unsigned x;
 unsigned inv1;

 for (x = 0; x  1000; ++x)
   {
 if (n)
   {
 int tmp_inv1 = m/n;
 non_inv1 = tmp_inv1 + x;
}
   }
 bar (non_inv1);
 }
We couldn't  move tmp_inv1 uncoditionally. Right (I have no clean
build at hands right now to check. If I'm wrong let's replace m/n to
something that couldn't be unconditionally moved :))? There is no phi
node for tmp_inv1.
Could you clarify what conditional lim should do in your opinion here?

Is it too little gain to move statements that only used in the same bb
or branch (I mean before phi node creation, even if stmt had phi
node)?

In fact, we could have here 2 thousands of divisions:
if (n)
  {
int tmp_inv1 = m/n;
tmp_inv2 = something_which_couldn't_be moved_unconditionally (tmp_inv1);
...
tmp_invN = something_which_couldn't_be moved_unconditionally (tmp_invN-1);
non_inv1 = tmp_invN + x;
  }



 3.3   replace_uses_in_conditions

 After computations movements we can have several copies of the cond
 stmt. In 3.1 we do replacement of uses based on stmt’s tgt_level. For,
 cond stmt it doesn’t work. As cond stmt, of course, have something in
 lim_aux_data-tgt_level, but, in fact, they are moved only if
 corresponding invariants are moved. So, in fact, the same cond (copies
 of it, of course) could go to the different levels. So to fix these
 uses, we call replace_uses_in_conditions.

 Hmm, but stmts that a condition depends on always have at least the
 target level of the condition statement.  Thus their computation is
 available in all other same conditon statements that might appear
 in more inner loop levels, no?  So isn't this just about performing the
 movement in a proper ordering?

 Let me give your more details, then may be you repeat question in other 
 words.
 for ()
 {
   if (a)
 {
inv1 = some_computations;
if (inv1  inv2)
 {
   inv3;
   inv4;
 }
}
 }

 So, the order for movement is: inv1; inv3; inv4. When we move inv1 we
 have tgt_level for inv3 and inv4 computed but we have not created
 copies of 'if (inv1  inv2)' created. In theory, we can have several
 copies:(1) for tgt_level of inv3 (2) for tgt_level of inv2 (3) in
 initial cycle. if (1) is in tgt level of inv1 we need no replacement,
 in other cases we need replacements. Of course, we know this(all the
 info about tgt_levels) when we move inv1, but then we need to create
 all needed copies (store it somehow, etc) of 'if (inv1  inv2') and
 make replacement in these copies. I don't see now why this is much
 better. This doesn't mean that current solution is perfect, just
 clarify your thought.

 I still don't get what replacements actually mean.  Let's extend the
 example to

 for ()  (A)
   {
 for ()  (B)
 {
   if (a)
 {
inv1 = some_computations;
if (inv1  inv2)
 {
   inv3;
   inv4;
 }
}
 }
  }

 then you say the target level of inv1 could be 'A' and the target
 level of inv3 'B'
 and that of inv4 is 'B' as well.

 That's true.  So we move inv1 to before 'A' and inv3/inv4 to before 'B' 
 guarded
 with if (inv1  inv2).  I fail to see what copies we need here.  'inv1' is
 available in all target levels below its own, no copies needed.


I meant the copies of conditions (COND_STMT: inv1  inv2).

Let's modify this example: tgt_level (inv1) = A; tgt_level(inv3) = A;
tgt_level(inv4) = B.

 Then
if (a)
   inv1=some_computations;
if (inv1  inv2)
   inv3;
preheader edge for A, where phi for inv1(phi_for_inv1) is computed
for ()  (A)
 {
 if (phi_for_inv1 inv2)
   inv3
 for ()  (B)
   {
   if (a)
 {
if (phi_for_inv1  inv2)
  {
  }
 }
 }
So, we have 3 copies of inv1  inv2 after lim (in some of them we use
inv1, in other phi_for_inv1). But this is the second priority for now
(this is all about whether we should move statements w/o phi node).


 More details on 3.1 and 3.2

 3.1 The patch is very similar for phi stmts and regular stmts

Re: conditional lim

2015-06-09 Thread Richard Biener
).

 Add_info_for_phi calls create_tmp_var which requires some explanation.

 Create_tmp_var

 To create new names for phi nodes and to create default def for phi
 arguments I use make_ssa_name and get_or_create_ssa_default_def. These
 methods have some asserts (being merged):   (TREE_CODE (old_var) ==
 VAR_DECL   || TREE_CODE (old_var) == PARM_DECL   || TREE_CODE
 (old_var) == RESULT_DECL).

 So in some cases (when I know that I need to use methods mentioned
 above, see the start of create_tmp_var, ie the uses in other loops) I
 create new tmp variable to overcome these asserts. To be honest I
 don’t like this but don’t know how to deal with this better.

Hmm, don't you just run into SSA names here?  That is, TREE_CODE
(old_var) == SSA_NAME?
You can use make_ssa_name (TREE_TYPE (old_var), plim) in this case.

Or you run into the issue that anonymous SSA names cannot have default
defintions (I don't understand why you need all the fuss about default defs).

 And a couple of words regarding where we store info for phi nodes:

 Each loop contains in its aux  phi_data structure. On this stage we
 only add there stmt (if phi node will be required), phi name=phi node
 result name in names_map or fl_names_map). See the comment about
 phi_data in patch.

 If there should be created several phi nodes for given lhs (I mean the
 first place for phi node doesn’t dominate on corresponding preheader,
 we add only the last name in names_map (as intermediate names could be
 created later, but the last name in bb which dominates preheader
 should be used for replacement in other loops. Replacements were
 already made in walker).

 If lhs is used only in phi node, which are moved and transformed into
 assignment, and there is no uses in other loops, we need to create
 only first level phi node (don’t need to create phi nodes till some bb
 which dominates preheader), then we add such name to fl_names_map.

 3.2 Create_phi_nodes

 For each of the loops we go over ((phi_data*) loop-aux)-stmts. These
 are statements which were moved, but they have uses in the original
 loop (more exactly, their uses in some other loops replaced by some
 name from ((phi_data*) loop-aux)-names_map or ((phi_data*)
 loop-aux)-fl_names_map.

 So for each of such stmts we create phi nodes (for its lhs) chain.

 Basic block for phi node is found in get_bb_for_phi  for given bb with
 stmt. If basic block for phi node dominates preheader-src we end
 chain creation. There is some special condition for the case when we
 need to create only first level phi node. (I described this situation
 above, but let me know If I need to add  more details.

 Basic blocks can have maps to identify if we created phi node for
 given variable in given basic_block, so we only add an edge argument
 for such case (phi_map = new hash_mapbasic_block, hash_maptree,
 gphi**).

Ok.

Well - I think it might be easier code-generation-wise to have a separate
phase move_phis executed before move_computations, that just moves
invariant PHI nodes (which this all is about - see above) and their
controlling conditions and only then move their dependencies.

Thanks,
Richard.

 In short, that’s all.

 Thanks,

 Evgeniya



 On Sat, May 9, 2015 at 1:07 AM, Evgeniya Maenkova
 evgeniya.maenk...@gmail.com wrote:
 Hi,

 Could you please review my patch for predicated lim?

 Let me note some details about it:



 1)  Phi statements are still moved only if they have 1 or 2
 arguments. However, phi statements could be move under conditions (as
 it’s done for the other statements).  Probably, phi statement motion
 with 3 + arguments could be implemented in the next patch after
 predicated lim.

 2)  Patch has limitations/features like (it was ok to me to
 implement it such way, maybe I’m not correct. ):

 a)  Loop1

 {

   If (a)

  Loop2

  {

Stmt - Invariant for Loop1

  }

  }

In this case Stmt will be moved only out of Loop2, because of if (a).

 b)  Or

 Loop1

 {

  …

  If (cond1)

   If (cond2)

   If (cond3)

   Stmt;

}

 Stmt will be moved out only if cond1 is always executed in Loop1.

 c)   It took me a long time to write all of these code, so there
 might be other peculiarities which I forgot to mention. :)

Let’s discuss these ones as you will review my patch.

 3)  Patch consists of 9 files:

 a)  gcc/testsuite/gcc.dg/tree-ssa/loop-7.c,
 gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – changed tests:

 -  gcc/testsuite/gcc.dg/tree-ssa/loop-7.c  changed as
 predicated lim moves 2 more statements out of the loop;

 -  gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – with conditional
 lim recip optimization in this test doesn’t work (the corresponding
 value is below threshold as I could see in the code for recip, 13).
 So to have recip working

Re: conditional lim

2015-06-09 Thread Evgeniya Maenkova
 defs).

I used default defs to write something in phi args when I have no values on
corresponding branches. Say there is some tmp value which I should
create phi for.
if (a)
   {
 tmpvar=something
 b = something(tmpvar);
}
else
   {
   }
When I move if (a) {tmpvar = something} and create phi for tmpvar (to
use phi result
in the initial cycle) i need to write something for 'not a' phi arg.
BUT, in some cases I use
 build_zero_cst (which I started to use after default defs use. I mean
I didn't know about
 build_zero_cst while I started to use default defs.). So may be I can replace
uses of default defs in such cases by build_zero_cst? I mean just SOME values
(as I know that control flow will never go to this places actually). I
mean if !a tmpvar will not used.

 And a couple of words regarding where we store info for phi nodes:

 Each loop contains in its aux  phi_data structure. On this stage we
 only add there stmt (if phi node will be required), phi name=phi node
 result name in names_map or fl_names_map). See the comment about
 phi_data in patch.

 If there should be created several phi nodes for given lhs (I mean the
 first place for phi node doesn’t dominate on corresponding preheader,
 we add only the last name in names_map (as intermediate names could be
 created later, but the last name in bb which dominates preheader
 should be used for replacement in other loops. Replacements were
 already made in walker).

 If lhs is used only in phi node, which are moved and transformed into
 assignment, and there is no uses in other loops, we need to create
 only first level phi node (don’t need to create phi nodes till some bb
 which dominates preheader), then we add such name to fl_names_map.

 3.2 Create_phi_nodes

 For each of the loops we go over ((phi_data*) loop-aux)-stmts. These
 are statements which were moved, but they have uses in the original
 loop (more exactly, their uses in some other loops replaced by some
 name from ((phi_data*) loop-aux)-names_map or ((phi_data*)
 loop-aux)-fl_names_map.

 So for each of such stmts we create phi nodes (for its lhs) chain.

 Basic block for phi node is found in get_bb_for_phi  for given bb with
 stmt. If basic block for phi node dominates preheader-src we end
 chain creation. There is some special condition for the case when we
 need to create only first level phi node. (I described this situation
 above, but let me know If I need to add  more details.

 Basic blocks can have maps to identify if we created phi node for
 given variable in given basic_block, so we only add an edge argument
 for such case (phi_map = new hash_mapbasic_block, hash_maptree,
 gphi**).

 Ok.

 Well - I think it might be easier code-generation-wise to have a separate
 phase move_phis executed before move_computations, that just moves
 invariant PHI nodes (which this all is about - see above) and their
 controlling conditions and only then move their dependencies.
I didn;t understand so far (I think let's clarify previous questions,especially
about phi nodes then let's go back to this one.)

 Thanks,
 Richard.

 In short, that’s all.

 Thanks,

 Evgeniya



 On Sat, May 9, 2015 at 1:07 AM, Evgeniya Maenkova
 evgeniya.maenk...@gmail.com wrote:
 Hi,

 Could you please review my patch for predicated lim?

 Let me note some details about it:



 1)  Phi statements are still moved only if they have 1 or 2
 arguments. However, phi statements could be move under conditions (as
 it’s done for the other statements).  Probably, phi statement motion
 with 3 + arguments could be implemented in the next patch after
 predicated lim.

 2)  Patch has limitations/features like (it was ok to me to
 implement it such way, maybe I’m not correct. ):

 a)  Loop1

 {

   If (a)

  Loop2

  {

Stmt - Invariant for Loop1

  }

  }

In this case Stmt will be moved only out of Loop2, because of if (a).

 b)  Or

 Loop1

 {

  …

  If (cond1)

   If (cond2)

   If (cond3)

   Stmt;

}

 Stmt will be moved out only if cond1 is always executed in Loop1.

 c)   It took me a long time to write all of these code, so there
 might be other peculiarities which I forgot to mention. :)

Let’s discuss these ones as you will review my patch.

 3)  Patch consists of 9 files:

 a)  gcc/testsuite/gcc.dg/tree-ssa/loop-7.c,
 gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – changed tests:

 -  gcc/testsuite/gcc.dg/tree-ssa/loop-7.c  changed as
 predicated lim moves 2 more statements out of the loop;

 -  gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – with conditional
 lim recip optimization in this test doesn’t work (the corresponding
 value is below threshold as I could see in the code for recip, 13).
 So to have recip working in this test I changed test a little bit.

 b

Re: conditional lim

2015-05-29 Thread Evgeniya Maenkova
for such case (phi_map = new hash_mapbasic_block, hash_maptree,
gphi**).

In short, that’s all.

Thanks,

Evgeniya



On Sat, May 9, 2015 at 1:07 AM, Evgeniya Maenkova
evgeniya.maenk...@gmail.com wrote:
 Hi,

 Could you please review my patch for predicated lim?

 Let me note some details about it:



 1)  Phi statements are still moved only if they have 1 or 2
 arguments. However, phi statements could be move under conditions (as
 it’s done for the other statements).  Probably, phi statement motion
 with 3 + arguments could be implemented in the next patch after
 predicated lim.

 2)  Patch has limitations/features like (it was ok to me to
 implement it such way, maybe I’m not correct. ):

 a)  Loop1

 {

   If (a)

  Loop2

  {

Stmt - Invariant for Loop1

  }

  }

In this case Stmt will be moved only out of Loop2, because of if (a).

 b)  Or

 Loop1

 {

  …

  If (cond1)

   If (cond2)

   If (cond3)

   Stmt;

}

 Stmt will be moved out only if cond1 is always executed in Loop1.

 c)   It took me a long time to write all of these code, so there
 might be other peculiarities which I forgot to mention. :)

Let’s discuss these ones as you will review my patch.

 3)  Patch consists of 9 files:

 a)  gcc/testsuite/gcc.dg/tree-ssa/loop-7.c,
 gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – changed tests:

 -  gcc/testsuite/gcc.dg/tree-ssa/loop-7.c  changed as
 predicated lim moves 2 more statements out of the loop;

 -  gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – with conditional
 lim recip optimization in this test doesn’t work (the corresponding
 value is below threshold as I could see in the code for recip, 13).
 So to have recip working in this test I changed test a little bit.

 b)  gcc/tree-ssa-loop-im.c – the patched lim per se

 c)   gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c,
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c,

 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c,
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c,

 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c,
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c

 the tests for conditional lim.

 4)  Patch testing:

 a)  make –k check (no difference in results for me for the clean
 build and the patched one,

 -  Revision: 222849,

 -  uname -a

Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue Oct
 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux

 b)  Bootstrap.

  It goes well now, however to fix it I have made a temporary hack in
 the lim code. And with this fix patch definitely shouldn’t be
 committed.

 I did so, as I would like to discuss this issue first.

 The problem is: I got stage2-stage3 comparison failure on the single
 file (tree-vect-data-refs.o). After some investigation I understood
 that tree-vect-data-refs.o differs being compiled with and without
 ‘-g’ option (yes, more exactly on stage 2 this is  ‘-g –O2 –gtoggle’,
 and for stage 3 this is ‘-g –O2’. But to simplify things I can
 reproduce this difference on the same build (even not bootstrapped),
 with option ‘ –g’ and without it).

 Of course, the file compiled with –g option will contain debug
 information and will differ from the corresponding file without debug
 information. I mean there is the difference reported by
 contrib/compare-debug (I mean we talk about stripped files).

 Then I compared assemblers and lim dump files and made a conclusion
 the difference is:

 There is statement _484=something, which is conditionally moved out of
 loop X. In non debug cases no usages of _484 are left inside the loop
 X. In debug case, there is the single use in debug statement. So for
 debug case my patch generates phi statement for _484 to make it
 available inside the loop X. For non debug case, no such phi statement
 generated as there is no uses of _484.

 There is one more statement with lhs=_493, with exactly this pattern
 of use. But this is not important for our discussion.



 So, in my opinion it’s ok to generate additional phi node for debug
 case. But I’m not a compiler expert and maybe there is some
 requirement that debug and non-debug versions should differ only by
 debug statements, I don’t know.



 My temporary hack fix is just removing of such debug statements (no
 debug statements, no problems J).



 The alternatives I see are:

 -  Move debug statements outside the loop (not good in my opinion);

 -  Somehow reset values in debug statements (not good in my
 opinion, if I could provide correct information for them);

 -  Generate phi for debug statements and fix bootstrap (say,
 let’s have some list of files, which we have special check for. I mean
 for my case, the difference is not in stage2 and stage3, it’s in debug
 and non-debug). I like this variant

Re: conditional lim

2015-05-27 Thread Richard Biener
On Tue, May 26, 2015 at 3:10 PM, Evgeniya Maenkova
evgeniya.maenk...@gmail.com wrote:
 Hi, Richard

 Thanks for review starting.

 Do you see any major issues with this patch (i.e. algorithms and ideas
 that should be completely replaced, effectively causing the re-write
 of most code)?

 To decide if there are major issues in the patch, perhaps, you need
 additional clarifications from me? Could you point at the places where
 additional explanations could save you most effort?

 Your answers to these questions are looking the first priority ones.
 You wrote about several issues in the code, which are looking as easy
 (or almost easy ;) to fix(inline functions, unswitch-loops flag,
 comments, etc). But, I think you agree, let’s first decide about the
 major issues (I mean, whether we continue with this patch or starting
 new one, this will save a lot of time for both of us).

I didn't get an overall idea on how the patch works, that is, how it integrates
with the existing algorithm.  If you can elaborate on that a bit that would
be helpful.

I think the code-generation part needs some work (whether by following
my idea with re-using copy_bbs or whether by basically re-implementing
it is up to debate).  How does your code handle

  for ()
{
   if (cond1)
{
   if (cond2)
 invariant;
   if (cond3)
 invariant;
}
}

?  Optimally we'd have before the loop exactly the same if () structure
(thus if (cond1) is shared).

Richard.


 Thanks,

 Evgeniya

 On Tue, May 26, 2015 at 2:31 PM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, May 8, 2015 at 11:07 PM, Evgeniya Maenkova
 evgeniya.maenk...@gmail.com wrote:
 Hi,

 Could you please review my patch for predicated lim?

 Let me note some details about it:



 1)  Phi statements are still moved only if they have 1 or 2
 arguments. However, phi statements could be move under conditions (as
 it’s done for the other statements).  Probably, phi statement motion
 with 3 + arguments could be implemented in the next patch after
 predicated lim.

 2)  Patch has limitations/features like (it was ok to me to
 implement it such way, maybe I’m not correct. ):

 a)  Loop1

 {

   If (a)

  Loop2

  {

Stmt - Invariant for Loop1

  }

  }

In this case Stmt will be moved only out of Loop2, because of if (a).

 b)  Or

 Loop1

 {

  …

  If (cond1)

   If (cond2)

   If (cond3)

   Stmt;

}

 Stmt will be moved out only if cond1 is always executed in Loop1.

 c)   It took me a long time to write all of these code, so there
 might be other peculiarities which I forgot to mention. :)

Let’s discuss these ones as you will review my patch.

 3)  Patch consists of 9 files:

 a)  gcc/testsuite/gcc.dg/tree-ssa/loop-7.c,
 gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – changed tests:

 -  gcc/testsuite/gcc.dg/tree-ssa/loop-7.c  changed as
 predicated lim moves 2 more statements out of the loop;

 -  gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – with conditional
 lim recip optimization in this test doesn’t work (the corresponding
 value is below threshold as I could see in the code for recip, 13).
 So to have recip working in this test I changed test a little bit.

 b)  gcc/tree-ssa-loop-im.c – the patched lim per se

 c)   gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c,
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c,

 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c,
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c,

 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c,
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c

 the tests for conditional lim.

 4)  Patch testing:

 a)  make –k check (no difference in results for me for the clean
 build and the patched one,

 -  Revision: 222849,

 -  uname -a

Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue Oct
 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux

 b)  Bootstrap.

  It goes well now, however to fix it I have made a temporary hack in
 the lim code. And with this fix patch definitely shouldn’t be
 committed.

 I did so, as I would like to discuss this issue first.

 The problem is: I got stage2-stage3 comparison failure on the single
 file (tree-vect-data-refs.o). After some investigation I understood
 that tree-vect-data-refs.o differs being compiled with and without
 ‘-g’ option (yes, more exactly on stage 2 this is  ‘-g –O2 –gtoggle’,
 and for stage 3 this is ‘-g –O2’. But to simplify things I can
 reproduce this difference on the same build (even not bootstrapped),
 with option ‘ –g’ and without it).

 Of course, the file compiled with –g option will contain debug
 information and will differ from the corresponding file without debug
 information. I mean there is the difference reported

Re: conditional lim

2015-05-27 Thread Evgeniya Maenkova
On Wed, May 27, 2015 at 2:11 PM, Richard Biener
richard.guent...@gmail.com wrote:
 On Tue, May 26, 2015 at 3:10 PM, Evgeniya Maenkova
 evgeniya.maenk...@gmail.com wrote:
 Hi, Richard

 Thanks for review starting.

 Do you see any major issues with this patch (i.e. algorithms and ideas
 that should be completely replaced, effectively causing the re-write
 of most code)?

 To decide if there are major issues in the patch, perhaps, you need
 additional clarifications from me? Could you point at the places where
 additional explanations could save you most effort?

 Your answers to these questions are looking the first priority ones.
 You wrote about several issues in the code, which are looking as easy
 (or almost easy ;) to fix(inline functions, unswitch-loops flag,
 comments, etc). But, I think you agree, let’s first decide about the
 major issues (I mean, whether we continue with this patch or starting
 new one, this will save a lot of time for both of us).

 I didn't get an overall idea on how the patch works, that is, how it 
 integrates
 with the existing algorithm.  If you can elaborate on that a bit that would
 be helpful.

Hi,
Sure, I'll write you some notes in several days.

 I think the code-generation part needs some work (whether by following
 my idea with re-using copy_bbs or whether by basically re-implementing
 it is up to debate).  How does your code handle

   for ()
 {
if (cond1)
 {
if (cond2)
  invariant;
if (cond3)
  invariant;
 }
 }

 ?  Optimally we'd have before the loop exactly the same if () structure
 (thus if (cond1) is shared).

If both invariants are going out of the same loop (i mean tgt_level),
then if structure will be the same.
for1()
  for ()
{
  if (cond1)
{
  if (cond2)
invariant1;
  if (cond3)
invariant2;
}
}

will be transformed to
for1()
  if (cond1)
{
  if (cond2)
invariant1;
  if (cond3)
invariant2;
 }
  }
  for ()
{
  if (cond1)
{
  if (cond2);
  if (cond3);
}
}
(I don't cleanup empty if's in lim code).


If these invarians are moved in different loops then

for1
  for2()
for()
  {
if (cond1)
  {
if (cond2)
  invariant1;
if (cond3)
  invariant2;
   }
   }

will be transformed to:
for1
  {
if (cond1)
  if (cond2)
invariant1;
for2()
  {
 if (cond1)
   if (cond3)
 invariant2;
 for()
   {
  if (cond1)
   {
 if (cond2);
 if (cond3);
   }
   }
}
   }

Of course, there could be some bugs, but the idea was as mentioned above.

This transformation was looking logical to me. What do you think?

Thanks,

Evgeniya


 Richard.





 Thanks,

 Evgeniya

 On Tue, May 26, 2015 at 2:31 PM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, May 8, 2015 at 11:07 PM, Evgeniya Maenkova
 evgeniya.maenk...@gmail.com wrote:
 Hi,

 Could you please review my patch for predicated lim?

 Let me note some details about it:



 1)  Phi statements are still moved only if they have 1 or 2
 arguments. However, phi statements could be move under conditions (as
 it’s done for the other statements).  Probably, phi statement motion
 with 3 + arguments could be implemented in the next patch after
 predicated lim.

 2)  Patch has limitations/features like (it was ok to me to
 implement it such way, maybe I’m not correct. ):

 a)  Loop1

 {

   If (a)

  Loop2

  {

Stmt - Invariant for Loop1

  }

  }

In this case Stmt will be moved only out of Loop2, because of if 
 (a).

 b)  Or

 Loop1

 {

  …

  If (cond1)

   If (cond2)

   If (cond3)

   Stmt;

}

 Stmt will be moved out only if cond1 is always executed in Loop1.

 c)   It took me a long time to write all of these code, so there
 might be other peculiarities which I forgot to mention. :)

Let’s discuss these ones as you will review my 
 patch.

 3)  Patch consists of 9 files:

 a)  gcc/testsuite/gcc.dg/tree-ssa/loop-7.c,
 gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – changed tests:

 -  gcc/testsuite/gcc.dg/tree-ssa/loop-7.c  changed as
 predicated lim moves 2 more statements out of the loop;

 -  gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – with conditional
 lim recip optimization in this test doesn’t work (the corresponding
 value is below threshold as I could see in the code for recip, 13).
 So to have recip working in this test I changed test a little bit.

 b)  gcc/tree-ssa-loop-im.c – the patched lim per se

 c)   gcc

Re: conditional lim

2015-05-26 Thread Richard Biener
On Fri, May 8, 2015 at 11:07 PM, Evgeniya Maenkova
evgeniya.maenk...@gmail.com wrote:
 Hi,

 Could you please review my patch for predicated lim?

 Let me note some details about it:



 1)  Phi statements are still moved only if they have 1 or 2
 arguments. However, phi statements could be move under conditions (as
 it’s done for the other statements).  Probably, phi statement motion
 with 3 + arguments could be implemented in the next patch after
 predicated lim.

 2)  Patch has limitations/features like (it was ok to me to
 implement it such way, maybe I’m not correct. ):

 a)  Loop1

 {

   If (a)

  Loop2

  {

Stmt - Invariant for Loop1

  }

  }

In this case Stmt will be moved only out of Loop2, because of if (a).

 b)  Or

 Loop1

 {

  …

  If (cond1)

   If (cond2)

   If (cond3)

   Stmt;

}

 Stmt will be moved out only if cond1 is always executed in Loop1.

 c)   It took me a long time to write all of these code, so there
 might be other peculiarities which I forgot to mention. :)

Let’s discuss these ones as you will review my patch.

 3)  Patch consists of 9 files:

 a)  gcc/testsuite/gcc.dg/tree-ssa/loop-7.c,
 gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – changed tests:

 -  gcc/testsuite/gcc.dg/tree-ssa/loop-7.c  changed as
 predicated lim moves 2 more statements out of the loop;

 -  gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – with conditional
 lim recip optimization in this test doesn’t work (the corresponding
 value is below threshold as I could see in the code for recip, 13).
 So to have recip working in this test I changed test a little bit.

 b)  gcc/tree-ssa-loop-im.c – the patched lim per se

 c)   gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c,
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c,

 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c,
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c,

 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c,
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c

 the tests for conditional lim.

 4)  Patch testing:

 a)  make –k check (no difference in results for me for the clean
 build and the patched one,

 -  Revision: 222849,

 -  uname -a

Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue Oct
 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux

 b)  Bootstrap.

  It goes well now, however to fix it I have made a temporary hack in
 the lim code. And with this fix patch definitely shouldn’t be
 committed.

 I did so, as I would like to discuss this issue first.

 The problem is: I got stage2-stage3 comparison failure on the single
 file (tree-vect-data-refs.o). After some investigation I understood
 that tree-vect-data-refs.o differs being compiled with and without
 ‘-g’ option (yes, more exactly on stage 2 this is  ‘-g –O2 –gtoggle’,
 and for stage 3 this is ‘-g –O2’. But to simplify things I can
 reproduce this difference on the same build (even not bootstrapped),
 with option ‘ –g’ and without it).

 Of course, the file compiled with –g option will contain debug
 information and will differ from the corresponding file without debug
 information. I mean there is the difference reported by
 contrib/compare-debug (I mean we talk about stripped files).

 Then I compared assemblers and lim dump files and made a conclusion
 the difference is:

 There is statement _484=something, which is conditionally moved out of
 loop X. In non debug cases no usages of _484 are left inside the loop
 X. In debug case, there is the single use in debug statement. So for
 debug case my patch generates phi statement for _484 to make it
 available inside the loop X. For non debug case, no such phi statement
 generated as there is no uses of _484.

 There is one more statement with lhs=_493, with exactly this pattern
 of use. But this is not important for our discussion.



 So, in my opinion it’s ok to generate additional phi node for debug
 case. But I’m not a compiler expert and maybe there is some
 requirement that debug and non-debug versions should differ only by
 debug statements, I don’t know.

As Andi said code generation needs to be the same.

 My temporary hack fix is just removing of such debug statements (no
 debug statements, no problems J).

That's fine and generally what is done in other passes.

 The alternatives I see are:

 -  Move debug statements outside the loop (not good in my opinion);

 -  Somehow reset values in debug statements (not good in my
 opinion, if I could provide correct information for them);

You could re-set them via gimple_debug_bind_reset_value which
will tell the user that at this point the value is optimized out
(that's slightly
better than removing the debug stmts).

 -  Generate phi for debug statements and fix bootstrap (say,
 let’s

Re: conditional lim

2015-05-26 Thread Evgeniya Maenkova
Hi, Richard

Thanks for review starting.

Do you see any major issues with this patch (i.e. algorithms and ideas
that should be completely replaced, effectively causing the re-write
of most code)?

To decide if there are major issues in the patch, perhaps, you need
additional clarifications from me? Could you point at the places where
additional explanations could save you most effort?

Your answers to these questions are looking the first priority ones.
You wrote about several issues in the code, which are looking as easy
(or almost easy ;) to fix(inline functions, unswitch-loops flag,
comments, etc). But, I think you agree, let’s first decide about the
major issues (I mean, whether we continue with this patch or starting
new one, this will save a lot of time for both of us).

Thanks,

Evgeniya

On Tue, May 26, 2015 at 2:31 PM, Richard Biener
richard.guent...@gmail.com wrote:
 On Fri, May 8, 2015 at 11:07 PM, Evgeniya Maenkova
 evgeniya.maenk...@gmail.com wrote:
 Hi,

 Could you please review my patch for predicated lim?

 Let me note some details about it:



 1)  Phi statements are still moved only if they have 1 or 2
 arguments. However, phi statements could be move under conditions (as
 it’s done for the other statements).  Probably, phi statement motion
 with 3 + arguments could be implemented in the next patch after
 predicated lim.

 2)  Patch has limitations/features like (it was ok to me to
 implement it such way, maybe I’m not correct. ):

 a)  Loop1

 {

   If (a)

  Loop2

  {

Stmt - Invariant for Loop1

  }

  }

In this case Stmt will be moved only out of Loop2, because of if (a).

 b)  Or

 Loop1

 {

  …

  If (cond1)

   If (cond2)

   If (cond3)

   Stmt;

}

 Stmt will be moved out only if cond1 is always executed in Loop1.

 c)   It took me a long time to write all of these code, so there
 might be other peculiarities which I forgot to mention. :)

Let’s discuss these ones as you will review my patch.

 3)  Patch consists of 9 files:

 a)  gcc/testsuite/gcc.dg/tree-ssa/loop-7.c,
 gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – changed tests:

 -  gcc/testsuite/gcc.dg/tree-ssa/loop-7.c  changed as
 predicated lim moves 2 more statements out of the loop;

 -  gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – with conditional
 lim recip optimization in this test doesn’t work (the corresponding
 value is below threshold as I could see in the code for recip, 13).
 So to have recip working in this test I changed test a little bit.

 b)  gcc/tree-ssa-loop-im.c – the patched lim per se

 c)   gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c,
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c,

 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c,
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c,

 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c,
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c

 the tests for conditional lim.

 4)  Patch testing:

 a)  make –k check (no difference in results for me for the clean
 build and the patched one,

 -  Revision: 222849,

 -  uname -a

Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue Oct
 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux

 b)  Bootstrap.

  It goes well now, however to fix it I have made a temporary hack in
 the lim code. And with this fix patch definitely shouldn’t be
 committed.

 I did so, as I would like to discuss this issue first.

 The problem is: I got stage2-stage3 comparison failure on the single
 file (tree-vect-data-refs.o). After some investigation I understood
 that tree-vect-data-refs.o differs being compiled with and without
 ‘-g’ option (yes, more exactly on stage 2 this is  ‘-g –O2 –gtoggle’,
 and for stage 3 this is ‘-g –O2’. But to simplify things I can
 reproduce this difference on the same build (even not bootstrapped),
 with option ‘ –g’ and without it).

 Of course, the file compiled with –g option will contain debug
 information and will differ from the corresponding file without debug
 information. I mean there is the difference reported by
 contrib/compare-debug (I mean we talk about stripped files).

 Then I compared assemblers and lim dump files and made a conclusion
 the difference is:

 There is statement _484=something, which is conditionally moved out of
 loop X. In non debug cases no usages of _484 are left inside the loop
 X. In debug case, there is the single use in debug statement. So for
 debug case my patch generates phi statement for _484 to make it
 available inside the loop X. For non debug case, no such phi statement
 generated as there is no uses of _484.

 There is one more statement with lhs=_493, with exactly this pattern
 of use. But this is not important for our discussion.



 So, in my opinion it’s ok to generate additional phi

[patch] conditional lim

2015-05-10 Thread Evgeniya Maenkova
[subj corrected]

Hi, Andy

Thanks for your clarification. Then I'll try to implement some chain
of debug statements to make the value computed in the branch available
inside the loop.
Or may be you have better ideas?


Waiting for further comments regarding the whole patch from you,
Richard and other developers.


Thanks,

Evgeniya

On Sat, May 9, 2015 at 4:41 PM, Andi Kleen a...@firstfloor.org wrote:
 Evgeniya Maenkova evgeniya.maenk...@gmail.com writes:

 So, in my opinion it’s ok to generate additional phi node for debug
 case. But I’m not a compiler expert and maybe there is some
 requirement that debug and non-debug versions should differ only by
 debug statements, I don’t know.

 gcc has such a requirement.

 Otherwise you cannot debug code that was originally not compiled
 with debugging.

 -Andi

 --
 a...@linux.intel.com -- Speaking for myself only



-- 
Thanks,

Evgeniya

perfstories.wordpress.com


Re: conditional lim

2015-05-09 Thread Andi Kleen
Evgeniya Maenkova evgeniya.maenk...@gmail.com writes:

 So, in my opinion it’s ok to generate additional phi node for debug
 case. But I’m not a compiler expert and maybe there is some
 requirement that debug and non-debug versions should differ only by
 debug statements, I don’t know.

gcc has such a requirement.

Otherwise you cannot debug code that was originally not compiled
with debugging.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


conditional lim

2015-05-08 Thread Evgeniya Maenkova
Hi,

Could you please review my patch for predicated lim?

Let me note some details about it:



1)  Phi statements are still moved only if they have 1 or 2
arguments. However, phi statements could be move under conditions (as
it’s done for the other statements).  Probably, phi statement motion
with 3 + arguments could be implemented in the next patch after
predicated lim.

2)  Patch has limitations/features like (it was ok to me to
implement it such way, maybe I’m not correct. ):

a)  Loop1

{

  If (a)

 Loop2

 {

   Stmt - Invariant for Loop1

 }

 }

   In this case Stmt will be moved only out of Loop2, because of if (a).

b)  Or

Loop1

{

 …

 If (cond1)

  If (cond2)

  If (cond3)

  Stmt;

   }

Stmt will be moved out only if cond1 is always executed in Loop1.

c)   It took me a long time to write all of these code, so there
might be other peculiarities which I forgot to mention. :)

   Let’s discuss these ones as you will review my patch.

3)  Patch consists of 9 files:

a)  gcc/testsuite/gcc.dg/tree-ssa/loop-7.c,
gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – changed tests:

-  gcc/testsuite/gcc.dg/tree-ssa/loop-7.c  changed as
predicated lim moves 2 more statements out of the loop;

-  gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – with conditional
lim recip optimization in this test doesn’t work (the corresponding
value is below threshold as I could see in the code for recip, 13).
So to have recip working in this test I changed test a little bit.

b)  gcc/tree-ssa-loop-im.c – the patched lim per se

c)   gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c,
gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c,

gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c,
gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c,

gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c,
gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c

the tests for conditional lim.

4)  Patch testing:

a)  make –k check (no difference in results for me for the clean
build and the patched one,

-  Revision: 222849,

-  uname -a

   Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue Oct
21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux

b)  Bootstrap.

 It goes well now, however to fix it I have made a temporary hack in
the lim code. And with this fix patch definitely shouldn’t be
committed.

I did so, as I would like to discuss this issue first.

The problem is: I got stage2-stage3 comparison failure on the single
file (tree-vect-data-refs.o). After some investigation I understood
that tree-vect-data-refs.o differs being compiled with and without
‘-g’ option (yes, more exactly on stage 2 this is  ‘-g –O2 –gtoggle’,
and for stage 3 this is ‘-g –O2’. But to simplify things I can
reproduce this difference on the same build (even not bootstrapped),
with option ‘ –g’ and without it).

Of course, the file compiled with –g option will contain debug
information and will differ from the corresponding file without debug
information. I mean there is the difference reported by
contrib/compare-debug (I mean we talk about stripped files).

Then I compared assemblers and lim dump files and made a conclusion
the difference is:

There is statement _484=something, which is conditionally moved out of
loop X. In non debug cases no usages of _484 are left inside the loop
X. In debug case, there is the single use in debug statement. So for
debug case my patch generates phi statement for _484 to make it
available inside the loop X. For non debug case, no such phi statement
generated as there is no uses of _484.

There is one more statement with lhs=_493, with exactly this pattern
of use. But this is not important for our discussion.



So, in my opinion it’s ok to generate additional phi node for debug
case. But I’m not a compiler expert and maybe there is some
requirement that debug and non-debug versions should differ only by
debug statements, I don’t know.



My temporary hack fix is just removing of such debug statements (no
debug statements, no problems J).



The alternatives I see are:

-  Move debug statements outside the loop (not good in my opinion);

-  Somehow reset values in debug statements (not good in my
opinion, if I could provide correct information for them);

-  Generate phi for debug statements and fix bootstrap (say,
let’s have some list of files, which we have special check for. I mean
for my case, the difference is not in stage2 and stage3, it’s in debug
and non-debug). I like this variant. However, as I don’t see such list
of special files in the whole gcc, I think I might be missing
something.

What do you think?



Thanks,



Evgeniya


patch_May8_3
Description: Binary data