Re: conditional lim
' and 'That is, if you move a conditionally executed statement but not any PHI node it feeds then the movement is useless.' I think about this example and don't understand. Could you clarify? Ok, so this case is already handled by LIM by simply moving tmp_inv1 = m*n (and making it unconditionally execute before the loop). In fact whether we can move it does not depend on whether 'cond' is invariant or not. If we couldn't execute m*n unconditionally on the loop preheader edge then we'd have to be able to also move the guarding condition. Note that cost considerations shouldn't come into play here. So I was only thinking of the case where we also want to move the condition (which is wanted if we want to move a PHI node). Don't understand So I was only thinking of the case where we also want to move the condition (which is wanted if we want to move a PHI node). What about the case when we want to move condition but don't have phi node to move. Let's consider a little bit modified example: void bar (long); void foo (int cond1, int cond2, int cond3, int m, int n) { unsigned x; unsigned inv1; for (x = 0; x 1000; ++x) { if (n) { int tmp_inv1 = m/n; non_inv1 = tmp_inv1 + x; } } bar (non_inv1); } We couldn't move tmp_inv1 uncoditionally. Right (I have no clean build at hands right now to check. If I'm wrong let's replace m/n to something that couldn't be unconditionally moved :))? There is no phi node for tmp_inv1. Could you clarify what conditional lim should do in your opinion here? Nothing at this point. If only then for the simple reason to make the first iteration of the patch simpler (still covering most cases). You seem to want to catch 100% of all possible cases. Is it too little gain to move statements that only used in the same bb or branch (I mean before phi node creation, even if stmt had phi node)? In fact, we could have here 2 thousands of divisions: if (n) { int tmp_inv1 = m/n; tmp_inv2 = something_which_couldn't_be moved_unconditionally (tmp_inv1); ... tmp_invN = something_which_couldn't_be moved_unconditionally (tmp_invN-1); non_inv1 = tmp_invN + x; } Of course we could. But the patch didn't even exercise this case (no testcase). Patch handles this case (this is why I created phi nodes which we tried to discuss). Your comment is about test case not about the patch. Please make patches simple - support for moving non-PHIs can be added as a followup. The first important part is to make the current code (moving PHIs) create control-flow rather than if-converted form. Richard. 3.3 replace_uses_in_conditions After computations movements we can have several copies of the cond stmt. In 3.1 we do replacement of uses based on stmt’s tgt_level. For, cond stmt it doesn’t work. As cond stmt, of course, have something in lim_aux_data-tgt_level, but, in fact, they are moved only if corresponding invariants are moved. So, in fact, the same cond (copies of it, of course) could go to the different levels. So to fix these uses, we call replace_uses_in_conditions. Hmm, but stmts that a condition depends on always have at least the target level of the condition statement. Thus their computation is available in all other same conditon statements that might appear in more inner loop levels, no? So isn't this just about performing the movement in a proper ordering? Let me give your more details, then may be you repeat question in other words. for () { if (a) { inv1 = some_computations; if (inv1 inv2) { inv3; inv4; } } } So, the order for movement is: inv1; inv3; inv4. When we move inv1 we have tgt_level for inv3 and inv4 computed but we have not created copies of 'if (inv1 inv2)' created. In theory, we can have several copies:(1) for tgt_level of inv3 (2) for tgt_level of inv2 (3) in initial cycle. if (1) is in tgt level of inv1 we need no replacement, in other cases we need replacements. Of course, we know this(all the info about tgt_levels) when we move inv1, but then we need to create all needed copies (store it somehow, etc) of 'if (inv1 inv2') and make replacement in these copies. I don't see now why this is much better. This doesn't mean that current solution is perfect, just clarify your thought. I still don't get what replacements actually mean. Let's extend the example to for () (A) { for () (B) { if (a) { inv1 = some_computations; if (inv1 inv2) { inv3; inv4; } } } } then you say the target level of inv1 could be 'A' and the target level of inv3 'B' and that of inv4 is 'B' as well. That's true. So we move inv1 to before 'A' and inv3/inv4 to before 'B' guarded with if (inv1 inv2). I
Re: conditional lim
. Or something else. So, when I read 'Only PHI nodes need PHI node creation' and 'That is, if you move a conditionally executed statement but not any PHI node it feeds then the movement is useless.' I think about this example and don't understand. Could you clarify? Ok, so this case is already handled by LIM by simply moving tmp_inv1 = m*n (and making it unconditionally execute before the loop). In fact whether we can move it does not depend on whether 'cond' is invariant or not. If we couldn't execute m*n unconditionally on the loop preheader edge then we'd have to be able to also move the guarding condition. Note that cost considerations shouldn't come into play here. So I was only thinking of the case where we also want to move the condition (which is wanted if we want to move a PHI node). Don't understand So I was only thinking of the case where we also want to move the condition (which is wanted if we want to move a PHI node). What about the case when we want to move condition but don't have phi node to move. Let's consider a little bit modified example: void bar (long); void foo (int cond1, int cond2, int cond3, int m, int n) { unsigned x; unsigned inv1; for (x = 0; x 1000; ++x) { if (n) { int tmp_inv1 = m/n; non_inv1 = tmp_inv1 + x; } } bar (non_inv1); } We couldn't move tmp_inv1 uncoditionally. Right (I have no clean build at hands right now to check. If I'm wrong let's replace m/n to something that couldn't be unconditionally moved :))? There is no phi node for tmp_inv1. Could you clarify what conditional lim should do in your opinion here? Nothing at this point. If only then for the simple reason to make the first iteration of the patch simpler (still covering most cases). You seem to want to catch 100% of all possible cases. Is it too little gain to move statements that only used in the same bb or branch (I mean before phi node creation, even if stmt had phi node)? In fact, we could have here 2 thousands of divisions: if (n) { int tmp_inv1 = m/n; tmp_inv2 = something_which_couldn't_be moved_unconditionally (tmp_inv1); ... tmp_invN = something_which_couldn't_be moved_unconditionally (tmp_invN-1); non_inv1 = tmp_invN + x; } Of course we could. But the patch didn't even exercise this case (no testcase). Patch handles this case (this is why I created phi nodes which we tried to discuss). Your comment is about test case not about the patch. My comment was about the lack of a testcase for this in the patch submission. But my whole point is that supporting this should be split off to a followup patch to make both patches easier to review. Richard. Please make patches simple - support for moving non-PHIs can be added as a followup. The first important part is to make the current code (moving PHIs) create control-flow rather than if-converted form. Richard. 3.3 replace_uses_in_conditions After computations movements we can have several copies of the cond stmt. In 3.1 we do replacement of uses based on stmt’s tgt_level. For, cond stmt it doesn’t work. As cond stmt, of course, have something in lim_aux_data-tgt_level, but, in fact, they are moved only if corresponding invariants are moved. So, in fact, the same cond (copies of it, of course) could go to the different levels. So to fix these uses, we call replace_uses_in_conditions. Hmm, but stmts that a condition depends on always have at least the target level of the condition statement. Thus their computation is available in all other same conditon statements that might appear in more inner loop levels, no? So isn't this just about performing the movement in a proper ordering? Let me give your more details, then may be you repeat question in other words. for () { if (a) { inv1 = some_computations; if (inv1 inv2) { inv3; inv4; } } } So, the order for movement is: inv1; inv3; inv4. When we move inv1 we have tgt_level for inv3 and inv4 computed but we have not created copies of 'if (inv1 inv2)' created. In theory, we can have several copies:(1) for tgt_level of inv3 (2) for tgt_level of inv2 (3) in initial cycle. if (1) is in tgt level of inv1 we need no replacement, in other cases we need replacements. Of course, we know this(all the info about tgt_levels) when we move inv1, but then we need to create all needed copies (store it somehow, etc) of 'if (inv1 inv2') and make replacement in these copies. I don't see now why this is much better. This doesn't mean that current solution is perfect, just clarify your thought. I still don't get what replacements actually mean. Let's extend the example to for () (A) { for () (B) { if (a) { inv1 = some_computations; if (inv1 inv2
Re: conditional lim
then the movement is useless.' I think about this example and don't understand. Could you clarify? Ok, so this case is already handled by LIM by simply moving tmp_inv1 = m*n (and making it unconditionally execute before the loop). In fact whether we can move it does not depend on whether 'cond' is invariant or not. If we couldn't execute m*n unconditionally on the loop preheader edge then we'd have to be able to also move the guarding condition. Note that cost considerations shouldn't come into play here. So I was only thinking of the case where we also want to move the condition (which is wanted if we want to move a PHI node). Don't understand So I was only thinking of the case where we also want to move the condition (which is wanted if we want to move a PHI node). What about the case when we want to move condition but don't have phi node to move. Let's consider a little bit modified example: void bar (long); void foo (int cond1, int cond2, int cond3, int m, int n) { unsigned x; unsigned inv1; for (x = 0; x 1000; ++x) { if (n) { int tmp_inv1 = m/n; non_inv1 = tmp_inv1 + x; } } bar (non_inv1); } We couldn't move tmp_inv1 uncoditionally. Right (I have no clean build at hands right now to check. If I'm wrong let's replace m/n to something that couldn't be unconditionally moved :))? There is no phi node for tmp_inv1. Could you clarify what conditional lim should do in your opinion here? Nothing at this point. If only then for the simple reason to make the first iteration of the patch simpler (still covering most cases). You seem to want to catch 100% of all possible cases. Is it too little gain to move statements that only used in the same bb or branch (I mean before phi node creation, even if stmt had phi node)? In fact, we could have here 2 thousands of divisions: if (n) { int tmp_inv1 = m/n; tmp_inv2 = something_which_couldn't_be moved_unconditionally (tmp_inv1); ... tmp_invN = something_which_couldn't_be moved_unconditionally (tmp_invN-1); non_inv1 = tmp_invN + x; } Of course we could. But the patch didn't even exercise this case (no testcase). Please make patches simple - support for moving non-PHIs can be added as a followup. The first important part is to make the current code (moving PHIs) create control-flow rather than if-converted form. Richard. 3.3 replace_uses_in_conditions After computations movements we can have several copies of the cond stmt. In 3.1 we do replacement of uses based on stmt’s tgt_level. For, cond stmt it doesn’t work. As cond stmt, of course, have something in lim_aux_data-tgt_level, but, in fact, they are moved only if corresponding invariants are moved. So, in fact, the same cond (copies of it, of course) could go to the different levels. So to fix these uses, we call replace_uses_in_conditions. Hmm, but stmts that a condition depends on always have at least the target level of the condition statement. Thus their computation is available in all other same conditon statements that might appear in more inner loop levels, no? So isn't this just about performing the movement in a proper ordering? Let me give your more details, then may be you repeat question in other words. for () { if (a) { inv1 = some_computations; if (inv1 inv2) { inv3; inv4; } } } So, the order for movement is: inv1; inv3; inv4. When we move inv1 we have tgt_level for inv3 and inv4 computed but we have not created copies of 'if (inv1 inv2)' created. In theory, we can have several copies:(1) for tgt_level of inv3 (2) for tgt_level of inv2 (3) in initial cycle. if (1) is in tgt level of inv1 we need no replacement, in other cases we need replacements. Of course, we know this(all the info about tgt_levels) when we move inv1, but then we need to create all needed copies (store it somehow, etc) of 'if (inv1 inv2') and make replacement in these copies. I don't see now why this is much better. This doesn't mean that current solution is perfect, just clarify your thought. I still don't get what replacements actually mean. Let's extend the example to for () (A) { for () (B) { if (a) { inv1 = some_computations; if (inv1 inv2) { inv3; inv4; } } } } then you say the target level of inv1 could be 'A' and the target level of inv3 'B' and that of inv4 is 'B' as well. That's true. So we move inv1 to before 'A' and inv3/inv4 to before 'B' guarded with if (inv1 inv2). I fail to see what copies we need here. 'inv1' is available in all target levels below its own, no copies needed. I meant the copies of conditions (COND_STMT: inv1 inv2). Let's modify this example: tgt_level (inv1) = A; tgt_level(inv3
Re: conditional lim
to have a separate phase move_phis executed before move_computations, that just moves invariant PHI nodes (which this all is about - see above) and their controlling conditions and only then move their dependencies. I didn;t understand so far (I think let's clarify previous questions,especially about phi nodes then let's go back to this one.) Yeah, I think the PHI node thing is our major mis-understanding (or the main thing that I think complicates the patch for too little gain - at least initially). Thanks, Richard. Thanks, Richard. In short, that’s all. Thanks, Evgeniya On Sat, May 9, 2015 at 1:07 AM, Evgeniya Maenkova evgeniya.maenk...@gmail.com wrote: Hi, Could you please review my patch for predicated lim? Let me note some details about it: 1) Phi statements are still moved only if they have 1 or 2 arguments. However, phi statements could be move under conditions (as it’s done for the other statements). Probably, phi statement motion with 3 + arguments could be implemented in the next patch after predicated lim. 2) Patch has limitations/features like (it was ok to me to implement it such way, maybe I’m not correct. ): a) Loop1 { If (a) Loop2 { Stmt - Invariant for Loop1 } } In this case Stmt will be moved only out of Loop2, because of if (a). b) Or Loop1 { … If (cond1) If (cond2) If (cond3) Stmt; } Stmt will be moved out only if cond1 is always executed in Loop1. c) It took me a long time to write all of these code, so there might be other peculiarities which I forgot to mention. :) Let’s discuss these ones as you will review my patch. 3) Patch consists of 9 files: a) gcc/testsuite/gcc.dg/tree-ssa/loop-7.c, gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – changed tests: - gcc/testsuite/gcc.dg/tree-ssa/loop-7.c changed as predicated lim moves 2 more statements out of the loop; - gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – with conditional lim recip optimization in this test doesn’t work (the corresponding value is below threshold as I could see in the code for recip, 13). So to have recip working in this test I changed test a little bit. b) gcc/tree-ssa-loop-im.c – the patched lim per se c) gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c the tests for conditional lim. 4) Patch testing: a) make –k check (no difference in results for me for the clean build and the patched one, - Revision: 222849, - uname -a Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue Oct 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux b) Bootstrap. It goes well now, however to fix it I have made a temporary hack in the lim code. And with this fix patch definitely shouldn’t be committed. I did so, as I would like to discuss this issue first. The problem is: I got stage2-stage3 comparison failure on the single file (tree-vect-data-refs.o). After some investigation I understood that tree-vect-data-refs.o differs being compiled with and without ‘-g’ option (yes, more exactly on stage 2 this is ‘-g –O2 –gtoggle’, and for stage 3 this is ‘-g –O2’. But to simplify things I can reproduce this difference on the same build (even not bootstrapped), with option ‘ –g’ and without it). Of course, the file compiled with –g option will contain debug information and will differ from the corresponding file without debug information. I mean there is the difference reported by contrib/compare-debug (I mean we talk about stripped files). Then I compared assemblers and lim dump files and made a conclusion the difference is: There is statement _484=something, which is conditionally moved out of loop X. In non debug cases no usages of _484 are left inside the loop X. In debug case, there is the single use in debug statement. So for debug case my patch generates phi statement for _484 to make it available inside the loop X. For non debug case, no such phi statement generated as there is no uses of _484. There is one more statement with lhs=_493, with exactly this pattern of use. But this is not important for our discussion. So, in my opinion it’s ok to generate additional phi node for debug case. But I’m not a compiler expert and maybe there is some requirement that debug and non-debug versions should differ only by debug statements, I don’t know. My temporary hack fix is just removing of such debug statements (no debug statements, no problems J
Re: conditional lim
? Ok, so this case is already handled by LIM by simply moving tmp_inv1 = m*n (and making it unconditionally execute before the loop). In fact whether we can move it does not depend on whether 'cond' is invariant or not. If we couldn't execute m*n unconditionally on the loop preheader edge then we'd have to be able to also move the guarding condition. Note that cost considerations shouldn't come into play here. So I was only thinking of the case where we also want to move the condition (which is wanted if we want to move a PHI node). Don't understand So I was only thinking of the case where we also want to move the condition (which is wanted if we want to move a PHI node). What about the case when we want to move condition but don't have phi node to move. Let's consider a little bit modified example: void bar (long); void foo (int cond1, int cond2, int cond3, int m, int n) { unsigned x; unsigned inv1; for (x = 0; x 1000; ++x) { if (n) { int tmp_inv1 = m/n; non_inv1 = tmp_inv1 + x; } } bar (non_inv1); } We couldn't move tmp_inv1 uncoditionally. Right (I have no clean build at hands right now to check. If I'm wrong let's replace m/n to something that couldn't be unconditionally moved :))? There is no phi node for tmp_inv1. Could you clarify what conditional lim should do in your opinion here? Is it too little gain to move statements that only used in the same bb or branch (I mean before phi node creation, even if stmt had phi node)? In fact, we could have here 2 thousands of divisions: if (n) { int tmp_inv1 = m/n; tmp_inv2 = something_which_couldn't_be moved_unconditionally (tmp_inv1); ... tmp_invN = something_which_couldn't_be moved_unconditionally (tmp_invN-1); non_inv1 = tmp_invN + x; } 3.3 replace_uses_in_conditions After computations movements we can have several copies of the cond stmt. In 3.1 we do replacement of uses based on stmt’s tgt_level. For, cond stmt it doesn’t work. As cond stmt, of course, have something in lim_aux_data-tgt_level, but, in fact, they are moved only if corresponding invariants are moved. So, in fact, the same cond (copies of it, of course) could go to the different levels. So to fix these uses, we call replace_uses_in_conditions. Hmm, but stmts that a condition depends on always have at least the target level of the condition statement. Thus their computation is available in all other same conditon statements that might appear in more inner loop levels, no? So isn't this just about performing the movement in a proper ordering? Let me give your more details, then may be you repeat question in other words. for () { if (a) { inv1 = some_computations; if (inv1 inv2) { inv3; inv4; } } } So, the order for movement is: inv1; inv3; inv4. When we move inv1 we have tgt_level for inv3 and inv4 computed but we have not created copies of 'if (inv1 inv2)' created. In theory, we can have several copies:(1) for tgt_level of inv3 (2) for tgt_level of inv2 (3) in initial cycle. if (1) is in tgt level of inv1 we need no replacement, in other cases we need replacements. Of course, we know this(all the info about tgt_levels) when we move inv1, but then we need to create all needed copies (store it somehow, etc) of 'if (inv1 inv2') and make replacement in these copies. I don't see now why this is much better. This doesn't mean that current solution is perfect, just clarify your thought. I still don't get what replacements actually mean. Let's extend the example to for () (A) { for () (B) { if (a) { inv1 = some_computations; if (inv1 inv2) { inv3; inv4; } } } } then you say the target level of inv1 could be 'A' and the target level of inv3 'B' and that of inv4 is 'B' as well. That's true. So we move inv1 to before 'A' and inv3/inv4 to before 'B' guarded with if (inv1 inv2). I fail to see what copies we need here. 'inv1' is available in all target levels below its own, no copies needed. I meant the copies of conditions (COND_STMT: inv1 inv2). Let's modify this example: tgt_level (inv1) = A; tgt_level(inv3) = A; tgt_level(inv4) = B. Then if (a) inv1=some_computations; if (inv1 inv2) inv3; preheader edge for A, where phi for inv1(phi_for_inv1) is computed for () (A) { if (phi_for_inv1 inv2) inv3 for () (B) { if (a) { if (phi_for_inv1 inv2) { } } } So, we have 3 copies of inv1 inv2 after lim (in some of them we use inv1, in other phi_for_inv1). But this is the second priority for now (this is all about whether we should move statements w/o phi node). More details on 3.1 and 3.2 3.1 The patch is very similar for phi stmts and regular stmts
Re: conditional lim
). Add_info_for_phi calls create_tmp_var which requires some explanation. Create_tmp_var To create new names for phi nodes and to create default def for phi arguments I use make_ssa_name and get_or_create_ssa_default_def. These methods have some asserts (being merged): (TREE_CODE (old_var) == VAR_DECL || TREE_CODE (old_var) == PARM_DECL || TREE_CODE (old_var) == RESULT_DECL). So in some cases (when I know that I need to use methods mentioned above, see the start of create_tmp_var, ie the uses in other loops) I create new tmp variable to overcome these asserts. To be honest I don’t like this but don’t know how to deal with this better. Hmm, don't you just run into SSA names here? That is, TREE_CODE (old_var) == SSA_NAME? You can use make_ssa_name (TREE_TYPE (old_var), plim) in this case. Or you run into the issue that anonymous SSA names cannot have default defintions (I don't understand why you need all the fuss about default defs). And a couple of words regarding where we store info for phi nodes: Each loop contains in its aux phi_data structure. On this stage we only add there stmt (if phi node will be required), phi name=phi node result name in names_map or fl_names_map). See the comment about phi_data in patch. If there should be created several phi nodes for given lhs (I mean the first place for phi node doesn’t dominate on corresponding preheader, we add only the last name in names_map (as intermediate names could be created later, but the last name in bb which dominates preheader should be used for replacement in other loops. Replacements were already made in walker). If lhs is used only in phi node, which are moved and transformed into assignment, and there is no uses in other loops, we need to create only first level phi node (don’t need to create phi nodes till some bb which dominates preheader), then we add such name to fl_names_map. 3.2 Create_phi_nodes For each of the loops we go over ((phi_data*) loop-aux)-stmts. These are statements which were moved, but they have uses in the original loop (more exactly, their uses in some other loops replaced by some name from ((phi_data*) loop-aux)-names_map or ((phi_data*) loop-aux)-fl_names_map. So for each of such stmts we create phi nodes (for its lhs) chain. Basic block for phi node is found in get_bb_for_phi for given bb with stmt. If basic block for phi node dominates preheader-src we end chain creation. There is some special condition for the case when we need to create only first level phi node. (I described this situation above, but let me know If I need to add more details. Basic blocks can have maps to identify if we created phi node for given variable in given basic_block, so we only add an edge argument for such case (phi_map = new hash_mapbasic_block, hash_maptree, gphi**). Ok. Well - I think it might be easier code-generation-wise to have a separate phase move_phis executed before move_computations, that just moves invariant PHI nodes (which this all is about - see above) and their controlling conditions and only then move their dependencies. Thanks, Richard. In short, that’s all. Thanks, Evgeniya On Sat, May 9, 2015 at 1:07 AM, Evgeniya Maenkova evgeniya.maenk...@gmail.com wrote: Hi, Could you please review my patch for predicated lim? Let me note some details about it: 1) Phi statements are still moved only if they have 1 or 2 arguments. However, phi statements could be move under conditions (as it’s done for the other statements). Probably, phi statement motion with 3 + arguments could be implemented in the next patch after predicated lim. 2) Patch has limitations/features like (it was ok to me to implement it such way, maybe I’m not correct. ): a) Loop1 { If (a) Loop2 { Stmt - Invariant for Loop1 } } In this case Stmt will be moved only out of Loop2, because of if (a). b) Or Loop1 { … If (cond1) If (cond2) If (cond3) Stmt; } Stmt will be moved out only if cond1 is always executed in Loop1. c) It took me a long time to write all of these code, so there might be other peculiarities which I forgot to mention. :) Let’s discuss these ones as you will review my patch. 3) Patch consists of 9 files: a) gcc/testsuite/gcc.dg/tree-ssa/loop-7.c, gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – changed tests: - gcc/testsuite/gcc.dg/tree-ssa/loop-7.c changed as predicated lim moves 2 more statements out of the loop; - gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – with conditional lim recip optimization in this test doesn’t work (the corresponding value is below threshold as I could see in the code for recip, 13). So to have recip working
Re: conditional lim
defs). I used default defs to write something in phi args when I have no values on corresponding branches. Say there is some tmp value which I should create phi for. if (a) { tmpvar=something b = something(tmpvar); } else { } When I move if (a) {tmpvar = something} and create phi for tmpvar (to use phi result in the initial cycle) i need to write something for 'not a' phi arg. BUT, in some cases I use build_zero_cst (which I started to use after default defs use. I mean I didn't know about build_zero_cst while I started to use default defs.). So may be I can replace uses of default defs in such cases by build_zero_cst? I mean just SOME values (as I know that control flow will never go to this places actually). I mean if !a tmpvar will not used. And a couple of words regarding where we store info for phi nodes: Each loop contains in its aux phi_data structure. On this stage we only add there stmt (if phi node will be required), phi name=phi node result name in names_map or fl_names_map). See the comment about phi_data in patch. If there should be created several phi nodes for given lhs (I mean the first place for phi node doesn’t dominate on corresponding preheader, we add only the last name in names_map (as intermediate names could be created later, but the last name in bb which dominates preheader should be used for replacement in other loops. Replacements were already made in walker). If lhs is used only in phi node, which are moved and transformed into assignment, and there is no uses in other loops, we need to create only first level phi node (don’t need to create phi nodes till some bb which dominates preheader), then we add such name to fl_names_map. 3.2 Create_phi_nodes For each of the loops we go over ((phi_data*) loop-aux)-stmts. These are statements which were moved, but they have uses in the original loop (more exactly, their uses in some other loops replaced by some name from ((phi_data*) loop-aux)-names_map or ((phi_data*) loop-aux)-fl_names_map. So for each of such stmts we create phi nodes (for its lhs) chain. Basic block for phi node is found in get_bb_for_phi for given bb with stmt. If basic block for phi node dominates preheader-src we end chain creation. There is some special condition for the case when we need to create only first level phi node. (I described this situation above, but let me know If I need to add more details. Basic blocks can have maps to identify if we created phi node for given variable in given basic_block, so we only add an edge argument for such case (phi_map = new hash_mapbasic_block, hash_maptree, gphi**). Ok. Well - I think it might be easier code-generation-wise to have a separate phase move_phis executed before move_computations, that just moves invariant PHI nodes (which this all is about - see above) and their controlling conditions and only then move their dependencies. I didn;t understand so far (I think let's clarify previous questions,especially about phi nodes then let's go back to this one.) Thanks, Richard. In short, that’s all. Thanks, Evgeniya On Sat, May 9, 2015 at 1:07 AM, Evgeniya Maenkova evgeniya.maenk...@gmail.com wrote: Hi, Could you please review my patch for predicated lim? Let me note some details about it: 1) Phi statements are still moved only if they have 1 or 2 arguments. However, phi statements could be move under conditions (as it’s done for the other statements). Probably, phi statement motion with 3 + arguments could be implemented in the next patch after predicated lim. 2) Patch has limitations/features like (it was ok to me to implement it such way, maybe I’m not correct. ): a) Loop1 { If (a) Loop2 { Stmt - Invariant for Loop1 } } In this case Stmt will be moved only out of Loop2, because of if (a). b) Or Loop1 { … If (cond1) If (cond2) If (cond3) Stmt; } Stmt will be moved out only if cond1 is always executed in Loop1. c) It took me a long time to write all of these code, so there might be other peculiarities which I forgot to mention. :) Let’s discuss these ones as you will review my patch. 3) Patch consists of 9 files: a) gcc/testsuite/gcc.dg/tree-ssa/loop-7.c, gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – changed tests: - gcc/testsuite/gcc.dg/tree-ssa/loop-7.c changed as predicated lim moves 2 more statements out of the loop; - gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – with conditional lim recip optimization in this test doesn’t work (the corresponding value is below threshold as I could see in the code for recip, 13). So to have recip working in this test I changed test a little bit. b
Re: conditional lim
for such case (phi_map = new hash_mapbasic_block, hash_maptree, gphi**). In short, that’s all. Thanks, Evgeniya On Sat, May 9, 2015 at 1:07 AM, Evgeniya Maenkova evgeniya.maenk...@gmail.com wrote: Hi, Could you please review my patch for predicated lim? Let me note some details about it: 1) Phi statements are still moved only if they have 1 or 2 arguments. However, phi statements could be move under conditions (as it’s done for the other statements). Probably, phi statement motion with 3 + arguments could be implemented in the next patch after predicated lim. 2) Patch has limitations/features like (it was ok to me to implement it such way, maybe I’m not correct. ): a) Loop1 { If (a) Loop2 { Stmt - Invariant for Loop1 } } In this case Stmt will be moved only out of Loop2, because of if (a). b) Or Loop1 { … If (cond1) If (cond2) If (cond3) Stmt; } Stmt will be moved out only if cond1 is always executed in Loop1. c) It took me a long time to write all of these code, so there might be other peculiarities which I forgot to mention. :) Let’s discuss these ones as you will review my patch. 3) Patch consists of 9 files: a) gcc/testsuite/gcc.dg/tree-ssa/loop-7.c, gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – changed tests: - gcc/testsuite/gcc.dg/tree-ssa/loop-7.c changed as predicated lim moves 2 more statements out of the loop; - gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – with conditional lim recip optimization in this test doesn’t work (the corresponding value is below threshold as I could see in the code for recip, 13). So to have recip working in this test I changed test a little bit. b) gcc/tree-ssa-loop-im.c – the patched lim per se c) gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c the tests for conditional lim. 4) Patch testing: a) make –k check (no difference in results for me for the clean build and the patched one, - Revision: 222849, - uname -a Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue Oct 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux b) Bootstrap. It goes well now, however to fix it I have made a temporary hack in the lim code. And with this fix patch definitely shouldn’t be committed. I did so, as I would like to discuss this issue first. The problem is: I got stage2-stage3 comparison failure on the single file (tree-vect-data-refs.o). After some investigation I understood that tree-vect-data-refs.o differs being compiled with and without ‘-g’ option (yes, more exactly on stage 2 this is ‘-g –O2 –gtoggle’, and for stage 3 this is ‘-g –O2’. But to simplify things I can reproduce this difference on the same build (even not bootstrapped), with option ‘ –g’ and without it). Of course, the file compiled with –g option will contain debug information and will differ from the corresponding file without debug information. I mean there is the difference reported by contrib/compare-debug (I mean we talk about stripped files). Then I compared assemblers and lim dump files and made a conclusion the difference is: There is statement _484=something, which is conditionally moved out of loop X. In non debug cases no usages of _484 are left inside the loop X. In debug case, there is the single use in debug statement. So for debug case my patch generates phi statement for _484 to make it available inside the loop X. For non debug case, no such phi statement generated as there is no uses of _484. There is one more statement with lhs=_493, with exactly this pattern of use. But this is not important for our discussion. So, in my opinion it’s ok to generate additional phi node for debug case. But I’m not a compiler expert and maybe there is some requirement that debug and non-debug versions should differ only by debug statements, I don’t know. My temporary hack fix is just removing of such debug statements (no debug statements, no problems J). The alternatives I see are: - Move debug statements outside the loop (not good in my opinion); - Somehow reset values in debug statements (not good in my opinion, if I could provide correct information for them); - Generate phi for debug statements and fix bootstrap (say, let’s have some list of files, which we have special check for. I mean for my case, the difference is not in stage2 and stage3, it’s in debug and non-debug). I like this variant
Re: conditional lim
On Tue, May 26, 2015 at 3:10 PM, Evgeniya Maenkova evgeniya.maenk...@gmail.com wrote: Hi, Richard Thanks for review starting. Do you see any major issues with this patch (i.e. algorithms and ideas that should be completely replaced, effectively causing the re-write of most code)? To decide if there are major issues in the patch, perhaps, you need additional clarifications from me? Could you point at the places where additional explanations could save you most effort? Your answers to these questions are looking the first priority ones. You wrote about several issues in the code, which are looking as easy (or almost easy ;) to fix(inline functions, unswitch-loops flag, comments, etc). But, I think you agree, let’s first decide about the major issues (I mean, whether we continue with this patch or starting new one, this will save a lot of time for both of us). I didn't get an overall idea on how the patch works, that is, how it integrates with the existing algorithm. If you can elaborate on that a bit that would be helpful. I think the code-generation part needs some work (whether by following my idea with re-using copy_bbs or whether by basically re-implementing it is up to debate). How does your code handle for () { if (cond1) { if (cond2) invariant; if (cond3) invariant; } } ? Optimally we'd have before the loop exactly the same if () structure (thus if (cond1) is shared). Richard. Thanks, Evgeniya On Tue, May 26, 2015 at 2:31 PM, Richard Biener richard.guent...@gmail.com wrote: On Fri, May 8, 2015 at 11:07 PM, Evgeniya Maenkova evgeniya.maenk...@gmail.com wrote: Hi, Could you please review my patch for predicated lim? Let me note some details about it: 1) Phi statements are still moved only if they have 1 or 2 arguments. However, phi statements could be move under conditions (as it’s done for the other statements). Probably, phi statement motion with 3 + arguments could be implemented in the next patch after predicated lim. 2) Patch has limitations/features like (it was ok to me to implement it such way, maybe I’m not correct. ): a) Loop1 { If (a) Loop2 { Stmt - Invariant for Loop1 } } In this case Stmt will be moved only out of Loop2, because of if (a). b) Or Loop1 { … If (cond1) If (cond2) If (cond3) Stmt; } Stmt will be moved out only if cond1 is always executed in Loop1. c) It took me a long time to write all of these code, so there might be other peculiarities which I forgot to mention. :) Let’s discuss these ones as you will review my patch. 3) Patch consists of 9 files: a) gcc/testsuite/gcc.dg/tree-ssa/loop-7.c, gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – changed tests: - gcc/testsuite/gcc.dg/tree-ssa/loop-7.c changed as predicated lim moves 2 more statements out of the loop; - gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – with conditional lim recip optimization in this test doesn’t work (the corresponding value is below threshold as I could see in the code for recip, 13). So to have recip working in this test I changed test a little bit. b) gcc/tree-ssa-loop-im.c – the patched lim per se c) gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c the tests for conditional lim. 4) Patch testing: a) make –k check (no difference in results for me for the clean build and the patched one, - Revision: 222849, - uname -a Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue Oct 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux b) Bootstrap. It goes well now, however to fix it I have made a temporary hack in the lim code. And with this fix patch definitely shouldn’t be committed. I did so, as I would like to discuss this issue first. The problem is: I got stage2-stage3 comparison failure on the single file (tree-vect-data-refs.o). After some investigation I understood that tree-vect-data-refs.o differs being compiled with and without ‘-g’ option (yes, more exactly on stage 2 this is ‘-g –O2 –gtoggle’, and for stage 3 this is ‘-g –O2’. But to simplify things I can reproduce this difference on the same build (even not bootstrapped), with option ‘ –g’ and without it). Of course, the file compiled with –g option will contain debug information and will differ from the corresponding file without debug information. I mean there is the difference reported
Re: conditional lim
On Wed, May 27, 2015 at 2:11 PM, Richard Biener richard.guent...@gmail.com wrote: On Tue, May 26, 2015 at 3:10 PM, Evgeniya Maenkova evgeniya.maenk...@gmail.com wrote: Hi, Richard Thanks for review starting. Do you see any major issues with this patch (i.e. algorithms and ideas that should be completely replaced, effectively causing the re-write of most code)? To decide if there are major issues in the patch, perhaps, you need additional clarifications from me? Could you point at the places where additional explanations could save you most effort? Your answers to these questions are looking the first priority ones. You wrote about several issues in the code, which are looking as easy (or almost easy ;) to fix(inline functions, unswitch-loops flag, comments, etc). But, I think you agree, let’s first decide about the major issues (I mean, whether we continue with this patch or starting new one, this will save a lot of time for both of us). I didn't get an overall idea on how the patch works, that is, how it integrates with the existing algorithm. If you can elaborate on that a bit that would be helpful. Hi, Sure, I'll write you some notes in several days. I think the code-generation part needs some work (whether by following my idea with re-using copy_bbs or whether by basically re-implementing it is up to debate). How does your code handle for () { if (cond1) { if (cond2) invariant; if (cond3) invariant; } } ? Optimally we'd have before the loop exactly the same if () structure (thus if (cond1) is shared). If both invariants are going out of the same loop (i mean tgt_level), then if structure will be the same. for1() for () { if (cond1) { if (cond2) invariant1; if (cond3) invariant2; } } will be transformed to for1() if (cond1) { if (cond2) invariant1; if (cond3) invariant2; } } for () { if (cond1) { if (cond2); if (cond3); } } (I don't cleanup empty if's in lim code). If these invarians are moved in different loops then for1 for2() for() { if (cond1) { if (cond2) invariant1; if (cond3) invariant2; } } will be transformed to: for1 { if (cond1) if (cond2) invariant1; for2() { if (cond1) if (cond3) invariant2; for() { if (cond1) { if (cond2); if (cond3); } } } } Of course, there could be some bugs, but the idea was as mentioned above. This transformation was looking logical to me. What do you think? Thanks, Evgeniya Richard. Thanks, Evgeniya On Tue, May 26, 2015 at 2:31 PM, Richard Biener richard.guent...@gmail.com wrote: On Fri, May 8, 2015 at 11:07 PM, Evgeniya Maenkova evgeniya.maenk...@gmail.com wrote: Hi, Could you please review my patch for predicated lim? Let me note some details about it: 1) Phi statements are still moved only if they have 1 or 2 arguments. However, phi statements could be move under conditions (as it’s done for the other statements). Probably, phi statement motion with 3 + arguments could be implemented in the next patch after predicated lim. 2) Patch has limitations/features like (it was ok to me to implement it such way, maybe I’m not correct. ): a) Loop1 { If (a) Loop2 { Stmt - Invariant for Loop1 } } In this case Stmt will be moved only out of Loop2, because of if (a). b) Or Loop1 { … If (cond1) If (cond2) If (cond3) Stmt; } Stmt will be moved out only if cond1 is always executed in Loop1. c) It took me a long time to write all of these code, so there might be other peculiarities which I forgot to mention. :) Let’s discuss these ones as you will review my patch. 3) Patch consists of 9 files: a) gcc/testsuite/gcc.dg/tree-ssa/loop-7.c, gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – changed tests: - gcc/testsuite/gcc.dg/tree-ssa/loop-7.c changed as predicated lim moves 2 more statements out of the loop; - gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – with conditional lim recip optimization in this test doesn’t work (the corresponding value is below threshold as I could see in the code for recip, 13). So to have recip working in this test I changed test a little bit. b) gcc/tree-ssa-loop-im.c – the patched lim per se c) gcc
Re: conditional lim
On Fri, May 8, 2015 at 11:07 PM, Evgeniya Maenkova evgeniya.maenk...@gmail.com wrote: Hi, Could you please review my patch for predicated lim? Let me note some details about it: 1) Phi statements are still moved only if they have 1 or 2 arguments. However, phi statements could be move under conditions (as it’s done for the other statements). Probably, phi statement motion with 3 + arguments could be implemented in the next patch after predicated lim. 2) Patch has limitations/features like (it was ok to me to implement it such way, maybe I’m not correct. ): a) Loop1 { If (a) Loop2 { Stmt - Invariant for Loop1 } } In this case Stmt will be moved only out of Loop2, because of if (a). b) Or Loop1 { … If (cond1) If (cond2) If (cond3) Stmt; } Stmt will be moved out only if cond1 is always executed in Loop1. c) It took me a long time to write all of these code, so there might be other peculiarities which I forgot to mention. :) Let’s discuss these ones as you will review my patch. 3) Patch consists of 9 files: a) gcc/testsuite/gcc.dg/tree-ssa/loop-7.c, gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – changed tests: - gcc/testsuite/gcc.dg/tree-ssa/loop-7.c changed as predicated lim moves 2 more statements out of the loop; - gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – with conditional lim recip optimization in this test doesn’t work (the corresponding value is below threshold as I could see in the code for recip, 13). So to have recip working in this test I changed test a little bit. b) gcc/tree-ssa-loop-im.c – the patched lim per se c) gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c the tests for conditional lim. 4) Patch testing: a) make –k check (no difference in results for me for the clean build and the patched one, - Revision: 222849, - uname -a Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue Oct 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux b) Bootstrap. It goes well now, however to fix it I have made a temporary hack in the lim code. And with this fix patch definitely shouldn’t be committed. I did so, as I would like to discuss this issue first. The problem is: I got stage2-stage3 comparison failure on the single file (tree-vect-data-refs.o). After some investigation I understood that tree-vect-data-refs.o differs being compiled with and without ‘-g’ option (yes, more exactly on stage 2 this is ‘-g –O2 –gtoggle’, and for stage 3 this is ‘-g –O2’. But to simplify things I can reproduce this difference on the same build (even not bootstrapped), with option ‘ –g’ and without it). Of course, the file compiled with –g option will contain debug information and will differ from the corresponding file without debug information. I mean there is the difference reported by contrib/compare-debug (I mean we talk about stripped files). Then I compared assemblers and lim dump files and made a conclusion the difference is: There is statement _484=something, which is conditionally moved out of loop X. In non debug cases no usages of _484 are left inside the loop X. In debug case, there is the single use in debug statement. So for debug case my patch generates phi statement for _484 to make it available inside the loop X. For non debug case, no such phi statement generated as there is no uses of _484. There is one more statement with lhs=_493, with exactly this pattern of use. But this is not important for our discussion. So, in my opinion it’s ok to generate additional phi node for debug case. But I’m not a compiler expert and maybe there is some requirement that debug and non-debug versions should differ only by debug statements, I don’t know. As Andi said code generation needs to be the same. My temporary hack fix is just removing of such debug statements (no debug statements, no problems J). That's fine and generally what is done in other passes. The alternatives I see are: - Move debug statements outside the loop (not good in my opinion); - Somehow reset values in debug statements (not good in my opinion, if I could provide correct information for them); You could re-set them via gimple_debug_bind_reset_value which will tell the user that at this point the value is optimized out (that's slightly better than removing the debug stmts). - Generate phi for debug statements and fix bootstrap (say, let’s
Re: conditional lim
Hi, Richard Thanks for review starting. Do you see any major issues with this patch (i.e. algorithms and ideas that should be completely replaced, effectively causing the re-write of most code)? To decide if there are major issues in the patch, perhaps, you need additional clarifications from me? Could you point at the places where additional explanations could save you most effort? Your answers to these questions are looking the first priority ones. You wrote about several issues in the code, which are looking as easy (or almost easy ;) to fix(inline functions, unswitch-loops flag, comments, etc). But, I think you agree, let’s first decide about the major issues (I mean, whether we continue with this patch or starting new one, this will save a lot of time for both of us). Thanks, Evgeniya On Tue, May 26, 2015 at 2:31 PM, Richard Biener richard.guent...@gmail.com wrote: On Fri, May 8, 2015 at 11:07 PM, Evgeniya Maenkova evgeniya.maenk...@gmail.com wrote: Hi, Could you please review my patch for predicated lim? Let me note some details about it: 1) Phi statements are still moved only if they have 1 or 2 arguments. However, phi statements could be move under conditions (as it’s done for the other statements). Probably, phi statement motion with 3 + arguments could be implemented in the next patch after predicated lim. 2) Patch has limitations/features like (it was ok to me to implement it such way, maybe I’m not correct. ): a) Loop1 { If (a) Loop2 { Stmt - Invariant for Loop1 } } In this case Stmt will be moved only out of Loop2, because of if (a). b) Or Loop1 { … If (cond1) If (cond2) If (cond3) Stmt; } Stmt will be moved out only if cond1 is always executed in Loop1. c) It took me a long time to write all of these code, so there might be other peculiarities which I forgot to mention. :) Let’s discuss these ones as you will review my patch. 3) Patch consists of 9 files: a) gcc/testsuite/gcc.dg/tree-ssa/loop-7.c, gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – changed tests: - gcc/testsuite/gcc.dg/tree-ssa/loop-7.c changed as predicated lim moves 2 more statements out of the loop; - gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – with conditional lim recip optimization in this test doesn’t work (the corresponding value is below threshold as I could see in the code for recip, 13). So to have recip working in this test I changed test a little bit. b) gcc/tree-ssa-loop-im.c – the patched lim per se c) gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c the tests for conditional lim. 4) Patch testing: a) make –k check (no difference in results for me for the clean build and the patched one, - Revision: 222849, - uname -a Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue Oct 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux b) Bootstrap. It goes well now, however to fix it I have made a temporary hack in the lim code. And with this fix patch definitely shouldn’t be committed. I did so, as I would like to discuss this issue first. The problem is: I got stage2-stage3 comparison failure on the single file (tree-vect-data-refs.o). After some investigation I understood that tree-vect-data-refs.o differs being compiled with and without ‘-g’ option (yes, more exactly on stage 2 this is ‘-g –O2 –gtoggle’, and for stage 3 this is ‘-g –O2’. But to simplify things I can reproduce this difference on the same build (even not bootstrapped), with option ‘ –g’ and without it). Of course, the file compiled with –g option will contain debug information and will differ from the corresponding file without debug information. I mean there is the difference reported by contrib/compare-debug (I mean we talk about stripped files). Then I compared assemblers and lim dump files and made a conclusion the difference is: There is statement _484=something, which is conditionally moved out of loop X. In non debug cases no usages of _484 are left inside the loop X. In debug case, there is the single use in debug statement. So for debug case my patch generates phi statement for _484 to make it available inside the loop X. For non debug case, no such phi statement generated as there is no uses of _484. There is one more statement with lhs=_493, with exactly this pattern of use. But this is not important for our discussion. So, in my opinion it’s ok to generate additional phi
[patch] conditional lim
[subj corrected] Hi, Andy Thanks for your clarification. Then I'll try to implement some chain of debug statements to make the value computed in the branch available inside the loop. Or may be you have better ideas? Waiting for further comments regarding the whole patch from you, Richard and other developers. Thanks, Evgeniya On Sat, May 9, 2015 at 4:41 PM, Andi Kleen a...@firstfloor.org wrote: Evgeniya Maenkova evgeniya.maenk...@gmail.com writes: So, in my opinion it’s ok to generate additional phi node for debug case. But I’m not a compiler expert and maybe there is some requirement that debug and non-debug versions should differ only by debug statements, I don’t know. gcc has such a requirement. Otherwise you cannot debug code that was originally not compiled with debugging. -Andi -- a...@linux.intel.com -- Speaking for myself only -- Thanks, Evgeniya perfstories.wordpress.com
Re: conditional lim
Evgeniya Maenkova evgeniya.maenk...@gmail.com writes: So, in my opinion it’s ok to generate additional phi node for debug case. But I’m not a compiler expert and maybe there is some requirement that debug and non-debug versions should differ only by debug statements, I don’t know. gcc has such a requirement. Otherwise you cannot debug code that was originally not compiled with debugging. -Andi -- a...@linux.intel.com -- Speaking for myself only
conditional lim
Hi, Could you please review my patch for predicated lim? Let me note some details about it: 1) Phi statements are still moved only if they have 1 or 2 arguments. However, phi statements could be move under conditions (as it’s done for the other statements). Probably, phi statement motion with 3 + arguments could be implemented in the next patch after predicated lim. 2) Patch has limitations/features like (it was ok to me to implement it such way, maybe I’m not correct. ): a) Loop1 { If (a) Loop2 { Stmt - Invariant for Loop1 } } In this case Stmt will be moved only out of Loop2, because of if (a). b) Or Loop1 { … If (cond1) If (cond2) If (cond3) Stmt; } Stmt will be moved out only if cond1 is always executed in Loop1. c) It took me a long time to write all of these code, so there might be other peculiarities which I forgot to mention. :) Let’s discuss these ones as you will review my patch. 3) Patch consists of 9 files: a) gcc/testsuite/gcc.dg/tree-ssa/loop-7.c, gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – changed tests: - gcc/testsuite/gcc.dg/tree-ssa/loop-7.c changed as predicated lim moves 2 more statements out of the loop; - gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – with conditional lim recip optimization in this test doesn’t work (the corresponding value is below threshold as I could see in the code for recip, 13). So to have recip working in this test I changed test a little bit. b) gcc/tree-ssa-loop-im.c – the patched lim per se c) gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c, gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c the tests for conditional lim. 4) Patch testing: a) make –k check (no difference in results for me for the clean build and the patched one, - Revision: 222849, - uname -a Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue Oct 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux b) Bootstrap. It goes well now, however to fix it I have made a temporary hack in the lim code. And with this fix patch definitely shouldn’t be committed. I did so, as I would like to discuss this issue first. The problem is: I got stage2-stage3 comparison failure on the single file (tree-vect-data-refs.o). After some investigation I understood that tree-vect-data-refs.o differs being compiled with and without ‘-g’ option (yes, more exactly on stage 2 this is ‘-g –O2 –gtoggle’, and for stage 3 this is ‘-g –O2’. But to simplify things I can reproduce this difference on the same build (even not bootstrapped), with option ‘ –g’ and without it). Of course, the file compiled with –g option will contain debug information and will differ from the corresponding file without debug information. I mean there is the difference reported by contrib/compare-debug (I mean we talk about stripped files). Then I compared assemblers and lim dump files and made a conclusion the difference is: There is statement _484=something, which is conditionally moved out of loop X. In non debug cases no usages of _484 are left inside the loop X. In debug case, there is the single use in debug statement. So for debug case my patch generates phi statement for _484 to make it available inside the loop X. For non debug case, no such phi statement generated as there is no uses of _484. There is one more statement with lhs=_493, with exactly this pattern of use. But this is not important for our discussion. So, in my opinion it’s ok to generate additional phi node for debug case. But I’m not a compiler expert and maybe there is some requirement that debug and non-debug versions should differ only by debug statements, I don’t know. My temporary hack fix is just removing of such debug statements (no debug statements, no problems J). The alternatives I see are: - Move debug statements outside the loop (not good in my opinion); - Somehow reset values in debug statements (not good in my opinion, if I could provide correct information for them); - Generate phi for debug statements and fix bootstrap (say, let’s have some list of files, which we have special check for. I mean for my case, the difference is not in stage2 and stage3, it’s in debug and non-debug). I like this variant. However, as I don’t see such list of special files in the whole gcc, I think I might be missing something. What do you think? Thanks, Evgeniya patch_May8_3 Description: Binary data