On Tue, Oct 21, 2014 at 4:09 PM, Richard Biener <richard.guent...@gmail.com> wrote: > On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >> Richard, >> >> I saw the sources of these functions, but I can't understand why I >> should use something else? Note that all predicate computations are >> located in basic blocks ( by design of if-conv) and there is special >> function that put these computations in bb >> (insert_gimplified_predicates). Edge contains only predicate not its >> computations. New function - find_insertion_point() does very simple >> search - it finds out the latest (in current bb) operand def-stmt of >> predicates taken from all incoming edges. >> In original algorithm the predicate of non-critical edge is taken to >> perform phi-node predication since for critical edge it does not work >> properly. >> >> My question is: does your comments mean that I should re-design my >> extensions? > > Well, we have infrastructure for inserting code on edges and you've > made critical edges predicated correctly. So why re-invent the wheel? > I realize this is very similar to my initial suggestion to simply split > critical edges in loops you want to if-convert but delays splitting > until it turns out to be necessary (which might be good for the > !force_vect case). > > For edge predicates you simply can emit their computation on the > edge, no? > > Btw, I very originally suggested to rework if-conversion to only > record edge predicates - having both block and edge predicates > somewhat complicates the code and makes it harder to > maintain (thus also the suggestion to simply split critical edges > if necessary to make BB predicates work always). > > Your patches add a lot of code and to me it seems we can avoid > doing so much special casing.
For example attacking the critical edge issue by a simple Index: tree-if-conv.c =================================================================== --- tree-if-conv.c (revision 216508) +++ tree-if-conv.c (working copy) @@ -980,11 +980,7 @@ if_convertible_bb_p (struct loop *loop, if (EDGE_COUNT (e->src->succs) == 1) found = true; if (!found) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "only critical predecessors\n"); - return false; - } + split_edge (EDGE_PRED (bb, 0)); } return true; it changes the number of blocks in the loop, so get_loop_body_in_if_conv_order should probably be re-done with the above eventually signalling that it created a new block. Or the above should populate a vector of edges to split and do that after the loop calling if_convertible_bb_p. Richard. > Richard. > >> Thanks. >> Yuri. >> >> BTW Jeff did initial review of my changes related to predicate >> computation for join blocks. I presented him updated patch with >> test-case and some minor changes in patch. But still did not get any >> feedback on it. Could you please take a look also on it? >> >> >> 2014-10-21 17:38 GMT+04:00 Richard Biener <richard.guent...@gmail.com>: >>> On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>>> Richard, >>>> >>>> Yes, This patch does not make sense since phi node predication for bb >>>> with critical incoming edges only performs another function which is >>>> absent (predicate_extended_scalar_phi). >>>> >>>> BTW I see that commit_edge_insertions() is used for rtx instructions >>>> only but you propose to use it for tree also. >>>> Did I miss something? >>> >>> Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert >>> if you want easy access to the newly created basic block to push >>> the predicate to - see gsi_commit_edge_inserts implementation). >>> >>> Richard. >>> >>>> Thanks ahead. >>>> >>>> >>>> 2014-10-21 16:44 GMT+04:00 Richard Biener <richard.guent...@gmail.com>: >>>>> On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>> wrote: >>>>>> Richard, >>>>>> >>>>>> I did some changes in patch and ChangeLog to mark that support for >>>>>> if-convert of blocks with only critical incoming edges will be added >>>>>> in the future (more precise in patch.4). >>>>> >>>>> But the same reasoning applies to this version of the patch when >>>>> flag_force_vectorize is true!? (insertion point and invalid SSA form) >>>>> >>>>> Which means the patch doesn't make sense in isolation? >>>>> >>>>> Btw, I think for the case you should simply do gsi_insert_on_edge () >>>>> and commit_edge_insertions () before the call to combine_blocks >>>>> (pushing the edge predicate to the newly created block). >>>>> >>>>> Richard. >>>>> >>>>>> Could you please review it. >>>>>> >>>>>> Thanks. >>>>>> >>>>>> ChangeLog: >>>>>> >>>>>> 2014-10-21 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>> >>>>>> (flag_force_vectorize): New variable. >>>>>> (edge_predicate): New function. >>>>>> (set_edge_predicate): New function. >>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list >>>>>> if destination block of edge is not always executed. Set-up predicate >>>>>> for critical edge. >>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>> if FLAG_FORCE_VECTORIZE was set-up. >>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >>>>>> (if_convertible_stmt_p): Fix up pre-function comments. >>>>>> (all_preds_critical_p): New function. >>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p >>>>>> to reject temporarily block if-conversion with incoming critical edges >>>>>> if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted >>>>>> after adding support for extended predication. >>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc >>>>>> to compute predicate instead of fold_build2_loc. >>>>>> Add zeroing of edge 'aux' field. >>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>> it returns NULL if given phi node must be handled by means of >>>>>> extended phi node predication. If number of predecessors of phi-block >>>>>> is equal 2 and at least one incoming edge is not critical original >>>>>> algorithm is used. >>>>>> (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false. >>>>>> Nullify 'aux' field of edges for blocks with two successors. >>>>>> >>>>>> 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev <ysrum...@gmail.com>: >>>>>>> Richard, >>>>>>> >>>>>>> Thanks for your answer! >>>>>>> >>>>>>> In current implementation phi node conversion assume that one of >>>>>>> incoming edge to bb containing given phi has at least one non-critical >>>>>>> edge and choose it to insert predicated code. But if we choose >>>>>>> critical edge we need to determine insert point and insertion >>>>>>> direction (before/after) since in other case we can get invalid ssa >>>>>>> form (use before def). This is done by my new function which is not in >>>>>>> current patch ( I will present this patch later). SO I assume that we >>>>>>> need to leave this patch as it is to not introduce new bugs. >>>>>>> >>>>>>> Thanks. >>>>>>> Yuri. >>>>>>> >>>>>>> 2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guent...@gmail.com>: >>>>>>>> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>> wrote: >>>>>>>>> Richard, >>>>>>>>> >>>>>>>>> I reworked the patch as you proposed, but I didn't understand what >>>>>>>>> did you mean by: >>>>>>>>> >>>>>>>>>>So please rework the patch so critical edges are always handled >>>>>>>>>>correctly. >>>>>>>>> >>>>>>>>> In current patch flag_force_vectorize is used (1) to reject phi nodes >>>>>>>>> with more than 2 arguments; (2) to reject basic blocks with only >>>>>>>>> critical incoming edges since support for extended predication of phi >>>>>>>>> nodes will be in next patch. >>>>>>>> >>>>>>>> I mean that (2) should not be rejected dependent on >>>>>>>> flag_force_vectorize. >>>>>>>> It was rejected because if-cvt couldn't handle it correctly before but >>>>>>>> with >>>>>>>> this patch this is fixed. I see no reason to still reject this then >>>>>>>> even >>>>>>>> for !flag_force_vectorize. >>>>>>>> >>>>>>>> Rejecting PHIs with more than two arguments with flag_force_vectorize >>>>>>>> is ok. >>>>>>>> >>>>>>>> Richard. >>>>>>>> >>>>>>>>> Could you please clarify your statement. >>>>>>>>> >>>>>>>>> I attached modified patch. >>>>>>>>> >>>>>>>>> ChangeLog: >>>>>>>>> >>>>>>>>> 2014-10-17 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>> >>>>>>>>> (flag_force_vectorize): New variable. >>>>>>>>> (edge_predicate): New function. >>>>>>>>> (set_edge_predicate): New function. >>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke >>>>>>>>> add_to_predicate_list >>>>>>>>> if destination block of edge is not always executed. Set-up predicate >>>>>>>>> for critical edge. >>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up. >>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments. >>>>>>>>> (all_edges_are_critical): New function. >>>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p >>>>>>>>> to reject block if-conversion with incoming critical edges only if >>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up. >>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc >>>>>>>>> to compute predicate instead of fold_build2_loc. >>>>>>>>> Add zeroing of edge 'aux' field. >>>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>>>> extended phi node predication. If number of predecessors of phi-block >>>>>>>>> is equal 2 and atleast one incoming edge is not critical original >>>>>>>>> algorithm is used. >>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false. >>>>>>>>> Nullify 'aux' field of edges for blocks with two successors. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> 2014-10-17 13:09 GMT+04:00 Richard Biener >>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev >>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>> Richard, >>>>>>>>>>> >>>>>>>>>>> Here is reduced patch as you requested. All your remarks have been >>>>>>>>>>> fixed. >>>>>>>>>>> Could you please look at it ( I have already sent the patch with >>>>>>>>>>> changes in add_to_predicate_list for review). >>>>>>>>>> >>>>>>>>>> + if (dump_file && (dump_flags & TDF_DETAILS)) >>>>>>>>>> + fprintf (dump_file, "More than two phi node >>>>>>>>>> args.\n"); >>>>>>>>>> + return false; >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + } >>>>>>>>>> >>>>>>>>>> Excess vertical space. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> +/* Assumes that BB has more than 2 predecessors. >>>>>>>>>> >>>>>>>>>> More than 1 predecessor? >>>>>>>>>> >>>>>>>>>> + Returns false if at least one successor is not on critical edge >>>>>>>>>> + and true otherwise. */ >>>>>>>>>> + >>>>>>>>>> +static inline bool >>>>>>>>>> +all_edges_are_critical (basic_block bb) >>>>>>>>>> +{ >>>>>>>>>> >>>>>>>>>> "all_preds_critical_p" would be a better name >>>>>>>>>> >>>>>>>>>> + if (EDGE_COUNT (bb->preds) > 2) >>>>>>>>>> + { >>>>>>>>>> + if (!flag_force_vectorize) >>>>>>>>>> + return false; >>>>>>>>>> + } >>>>>>>>>> >>>>>>>>>> as I said in the last review I don't think we should restrict edge >>>>>>>>>> predicates to flag_force_vectorize. At least I can't see how >>>>>>>>>> if-conversion is magically more expensive for that case? >>>>>>>>>> >>>>>>>>>> So please rework the patch so critical edges are always handled >>>>>>>>>> correctly. >>>>>>>>>> >>>>>>>>>> Ok with that and the above suggested changes. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Richard. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Thanks. >>>>>>>>>>> Yuri. >>>>>>>>>>> ChangeLog >>>>>>>>>>> 2014-10-16 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>> >>>>>>>>>>> (flag_force_vectorize): New variable. >>>>>>>>>>> (edge_predicate): New function. >>>>>>>>>>> (set_edge_predicate): New function. >>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke >>>>>>>>>>> add_to_predicate_list >>>>>>>>>>> if destination block of edge is not always executed. Set-up >>>>>>>>>>> predicate >>>>>>>>>>> for critical edge. >>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up. >>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments. >>>>>>>>>>> (all_edges_are_critical): New function. >>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if >>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical >>>>>>>>>>> to reject block if-conversion with incoming critical edges only if >>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up. >>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc >>>>>>>>>>> to compute predicate instead of fold_build2_loc. >>>>>>>>>>> Add zeroing of edge 'aux' field. >>>>>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>>>>>> extended phi node predication. If number of predecessors of >>>>>>>>>>> phi-block >>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original >>>>>>>>>>> algorithm is used. >>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to >>>>>>>>>>> false. >>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener >>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev >>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>> Richard, >>>>>>>>>>>>> >>>>>>>>>>>>> Here is updated patch (part1) for extended if conversion. >>>>>>>>>>>>> >>>>>>>>>>>>> Second part of patch will be sent later. >>>>>>>>>>>> >>>>>>>>>>>> Ok, I'm starting to look at this. I'd still like you to split >>>>>>>>>>>> things up >>>>>>>>>>>> more. >>>>>>>>>>>> >>>>>>>>>>>> static inline void >>>>>>>>>>>> add_to_predicate_list (struct loop *loop, basic_block bb, tree nc) >>>>>>>>>>>> { >>>>>>>>>>>> ... >>>>>>>>>>>> >>>>>>>>>>>> + /* We use notion of cd equivalence to get simplier >>>>>>>>>>>> predicate for >>>>>>>>>>>> + join block, e.g. if join block has 2 predecessors with >>>>>>>>>>>> predicates >>>>>>>>>>>> + p1 & p2 and p1 & !p2, we'd like to get p1 for it instead >>>>>>>>>>>> of >>>>>>>>>>>> + p1 & p2 | p1 & !p2. */ >>>>>>>>>>>> + if (dom_bb != loop->header >>>>>>>>>>>> + && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) >>>>>>>>>>>> == bb) >>>>>>>>>>>> + { >>>>>>>>>>>> + gcc_assert (flow_bb_inside_loop_p (loop, dom_bb)); >>>>>>>>>>>> + bc = bb_predicate (dom_bb); >>>>>>>>>>>> + gcc_assert (!is_true_predicate (bc)); >>>>>>>>>>>> >>>>>>>>>>>> these changes look worthwhile even for !flag_force_vectorize. So >>>>>>>>>>>> please >>>>>>>>>>>> split the change to add_to_predicate_list out and compute >>>>>>>>>>>> post-dominators >>>>>>>>>>>> unconditionally. Note that you should call free_dominance_info >>>>>>>>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion. >>>>>>>>>>>> >>>>>>>>>>>> + if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest)) >>>>>>>>>>>> + add_to_predicate_list (loop, e->dest, cond); >>>>>>>>>>>> + >>>>>>>>>>>> + /* If edge E is critical save predicate on it. */ >>>>>>>>>>>> + if (EDGE_COUNT (e->dest->preds) >= 2) >>>>>>>>>>>> + set_edge_predicate (e, cond); >>>>>>>>>>>> >>>>>>>>>>>> how do we know the edge is critical by this simple check? Why not >>>>>>>>>>>> simply always save edge predicates (well, you kind of do but omit >>>>>>>>>>>> the case where e->src dominates e->dest). >>>>>>>>>>>> >>>>>>>>>>>> Btw, you can rely on edge->aux being NULL at the start of the >>>>>>>>>>>> pass but need to clear it at the end (best use clear_aux_for_edges >>>>>>>>>>>> () >>>>>>>>>>>> for that). So stuff like >>>>>>>>>>>> >>>>>>>>>>>> + extract_true_false_edges_from_block (bb, &true_edge, >>>>>>>>>>>> &false_edge); >>>>>>>>>>>> + if (flag_force_vectorize) >>>>>>>>>>>> + true_edge->aux = false_edge->aux = NULL; >>>>>>>>>>>> >>>>>>>>>>>> shouldn't be necessary. >>>>>>>>>>>> >>>>>>>>>>>> I think the edge predicate handling should also be unconditionally >>>>>>>>>>>> and not depend on flag_force_vectorize. >>>>>>>>>>>> >>>>>>>>>>>> + /* The loop latch and loop exit block are always executed >>>>>>>>>>>> and >>>>>>>>>>>> + have no extra conditions to be processed: skip them. */ >>>>>>>>>>>> + if (bb == loop->latch >>>>>>>>>>>> + || bb_with_exit_edge_p (loop, bb)) >>>>>>>>>>>> >>>>>>>>>>>> I don't think the edge stuff is true - given you still only reset >>>>>>>>>>>> the >>>>>>>>>>>> loop->latch bb predicate the change looks broken. >>>>>>>>>>>> >>>>>>>>>>>> + /* Fold_build2 can produce bool conversion which is not >>>>>>>>>>>> + supported by vectorizer, so re-build it without >>>>>>>>>>>> folding. >>>>>>>>>>>> + For example, such conversion is generated for >>>>>>>>>>>> sequence: >>>>>>>>>>>> + _Bool _7, _8, _9; >>>>>>>>>>>> + _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9; >>>>>>>>>>>> + if (_9 != 0) --> (bool)_9. */ >>>>>>>>>>>> + >>>>>>>>>>>> + if (CONVERT_EXPR_P (c) >>>>>>>>>>>> + && TREE_CODE_CLASS (code) == tcc_comparison) >>>>>>>>>>>> >>>>>>>>>>>> I think you should simply use canonicalize_cond_expr_cond on the >>>>>>>>>>>> folding result. Or rather _not_ fold at all - we are taking the >>>>>>>>>>>> operands from the GIMPLE condition unmodified after all. >>>>>>>>>>>> >>>>>>>>>>>> - add_to_dst_predicate_list (loop, false_edge, >>>>>>>>>>>> - unshare_expr (cond), c2); >>>>>>>>>>>> + add_to_dst_predicate_list (loop, false_edge, >>>>>>>>>>>> unshare_expr (cond), >>>>>>>>>>>> + unshare_expr (c2)); >>>>>>>>>>>> >>>>>>>>>>>> why is it necessary to unshare c2? >>>>>>>>>>>> >>>>>>>>>>>> Please split out the PHI-with-multi-arg handling (I have not >>>>>>>>>>>> looked at >>>>>>>>>>>> that in detail). >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Richard. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Changelog. >>>>>>>>>>>>> >>>>>>>>>>>>> 2014-10-13 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>>> >>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function >>>>>>>>>>>>> clone. >>>>>>>>>>>>> (flag_force_vectorize): New variable. >>>>>>>>>>>>> (edge_predicate): New function. >>>>>>>>>>>>> (set_edge_predicate): New function. >>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always >>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block >>>>>>>>>>>>> for join blocks if it exists. >>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if >>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate >>>>>>>>>>>>> for critical edge. >>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up. >>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments. >>>>>>>>>>>>> (all_edges_are_critical): New function. >>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if >>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of >>>>>>>>>>>>> all_edges_are_critical >>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if >>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up. >>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if >>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using >>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under >>>>>>>>>>>>> FLAG_FORCE_VECTORIZE. >>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if >>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's. >>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>>>>>>>> extended phi node predication. If number of predecessors of >>>>>>>>>>>>> phi-block >>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original >>>>>>>>>>>>> algorithm is used. >>>>>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function. >>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE. >>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and >>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals >>>>>>>>>>>>> that extended predication must be applied). >>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic >>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert >>>>>>>>>>>>> predicates at the block begining for extended if-conversion. >>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current >>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop >>>>>>>>>>>>> versioning >>>>>>>>>>>>> for innermost loop marked with pragma omp simd and >>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of >>>>>>>>>>>>> edges >>>>>>>>>>>>> for blocks with two successors. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrum...@gmail.com>: >>>>>>>>>>>>>> Richard, >>>>>>>>>>>>>> >>>>>>>>>>>>>> here is reduced patch (part.1) which was reduced almost twice. >>>>>>>>>>>>>> Let's me also answer on your comments. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1. I really use edge field 'aux' to keep predicate for critical >>>>>>>>>>>>>> edges. >>>>>>>>>>>>>> My previous code was not correct and now it looks like: >>>>>>>>>>>>>> >>>>>>>>>>>>>> if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) >>>>>>>>>>>>>> == 1) >>>>>>>>>>>>>> /* Edge E is not critical, use predicate of edge source bb. >>>>>>>>>>>>>> */ >>>>>>>>>>>>>> c = bb_predicate (b); >>>>>>>>>>>>>> else >>>>>>>>>>>>>> /* Edge E is critical and its aux field contains predicate. >>>>>>>>>>>>>> */ >>>>>>>>>>>>>> c = edge_predicate (e); >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2. I completely delete all code related to creation of >>>>>>>>>>>>>> conditional >>>>>>>>>>>>>> expressions and completely rely on bool pattern recognition in >>>>>>>>>>>>>> vectorizer. But we need to delete all dead predicate computations >>>>>>>>>>>>>> which are not used since they prevent vectorization. I will add >>>>>>>>>>>>>> this >>>>>>>>>>>>>> local-dce function in next patch. >>>>>>>>>>>>>> 3. I also did not include in this patch recognition of general >>>>>>>>>>>>>> phi-nodes with two arguments only for which conversion of >>>>>>>>>>>>>> conditional >>>>>>>>>>>>>> scalar reduction can be applied also. >>>>>>>>>>>>>> Note that all these changes are applied for loop marked with >>>>>>>>>>>>>> pragma >>>>>>>>>>>>>> omp simd only. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2014-09-22 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>>>> >>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function >>>>>>>>>>>>>> clone. >>>>>>>>>>>>>> (flag_force_vectorize): New variable. >>>>>>>>>>>>>> (edge_predicate): New function. >>>>>>>>>>>>>> (set_edge_predicate): New function. >>>>>>>>>>>>>> (convert_name_to_cmp): New function. >>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always >>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block >>>>>>>>>>>>>> for join blocks if it exists. >>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if >>>>>>>>>>>>>> destination block of edge is not always executed. Set-up >>>>>>>>>>>>>> predicate >>>>>>>>>>>>>> for critical edge. >>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up. >>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments. >>>>>>>>>>>>>> (all_edges_are_critical): New function. >>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if >>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of >>>>>>>>>>>>>> all_edges_are_critical >>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only >>>>>>>>>>>>>> if >>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up. >>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if >>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using >>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under >>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE. >>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if >>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's. >>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>>>>>>>>> extended phi node predication. If number of predecessors of >>>>>>>>>>>>>> phi-block >>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original >>>>>>>>>>>>>> algorithm is used. >>>>>>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function. >>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE. >>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and >>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals >>>>>>>>>>>>>> that extended predication must be applied). >>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic >>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert >>>>>>>>>>>>>> predicates at the block begining for extended if-conversion. >>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from >>>>>>>>>>>>>> current >>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop >>>>>>>>>>>>>> versioning >>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and >>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field >>>>>>>>>>>>>> of edges >>>>>>>>>>>>>> for blocks with two successors. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener >>>>>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev >>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>>>> Richard! >>>>>>>>>>>>>>>> Here is updated patch with the following changes: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 1. Any restrictions on phi-function were eliminated for >>>>>>>>>>>>>>>> extended conversion. >>>>>>>>>>>>>>>> 2. Put predicate for critical edges to 'aux' field of edge, >>>>>>>>>>>>>>>> i.e. >>>>>>>>>>>>>>>> negate_predicate was deleted. >>>>>>>>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing >>>>>>>>>>>>>>>> edges can >>>>>>>>>>>>>>>> be critical. >>>>>>>>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join >>>>>>>>>>>>>>>> basic >>>>>>>>>>>>>>>> blocks to simplify it. >>>>>>>>>>>>>>>> 5. I decided to not design pre-pass since it will lead >>>>>>>>>>>>>>>> generating >>>>>>>>>>>>>>>> chain of cond expressions for phi-node if conversion, whereas >>>>>>>>>>>>>>>> for phi >>>>>>>>>>>>>>>> of kind >>>>>>>>>>>>>>>> x = PHI <1(2), 1(3), 2(4)> >>>>>>>>>>>>>>>> only one cond expression is required and this is considered as >>>>>>>>>>>>>>>> simple >>>>>>>>>>>>>>>> optimization for arbitrary phi-function. More precise, >>>>>>>>>>>>>>>> if phi-function have only two different arguments and one of >>>>>>>>>>>>>>>> them has >>>>>>>>>>>>>>>> single occurrence, if- conversion is performed as if phi have >>>>>>>>>>>>>>>> only 2 >>>>>>>>>>>>>>>> arguments. >>>>>>>>>>>>>>>> For arbitrary phi function a chain of cond expressions is >>>>>>>>>>>>>>>> produced. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Updated patch is attached. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Any comments will be appreciated. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The patch is still very big and does multiple things at once >>>>>>>>>>>>>>> which makes >>>>>>>>>>>>>>> it hard to review. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In addition to that it changes function singatures without >>>>>>>>>>>>>>> updating >>>>>>>>>>>>>>> the function comments. For example what is the convert_bool >>>>>>>>>>>>>>> argument doing to add_to_dst_predicate_list? Why do we need >>>>>>>>>>>>>>> all this added logic. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> You duplicate operand_equal_for_phi_arg_p. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think the code handling PHIs with more than two operands but >>>>>>>>>>>>>>> only two unequal operands is useful generally, so that's an >>>>>>>>>>>>>>> obvious >>>>>>>>>>>>>>> candidate for splitting out into a separate patch. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> + CONVERT_BOOL argument was added to convert bool predicate >>>>>>>>>>>>>>> computations >>>>>>>>>>>>>>> + which is not supported by vectorizer to int type through >>>>>>>>>>>>>>> creating of >>>>>>>>>>>>>>> + conditional expressions. */ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Example? The vectorizer has patterns for bool predicate >>>>>>>>>>>>>>> computations. >>>>>>>>>>>>>>> This seems to be another feature that needs splitting out. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The way you get around the critical edge parts looks awkward to >>>>>>>>>>>>>>> me. >>>>>>>>>>>>>>> Please either do _all_ predicates as edge predicates or simply >>>>>>>>>>>>>>> split critical edges (of the respective loop body). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I still think that an utility doing same PHI arg merging by >>>>>>>>>>>>>>> introducing >>>>>>>>>>>>>>> forwarder blocks would be nicer to have. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'd restructure the main tree_if_conversion function to apply >>>>>>>>>>>>>>> these >>>>>>>>>>>>>>> CFG pre-transforms when we are going to version the loop >>>>>>>>>>>>>>> for if conversion (eventually transitioning to always doing >>>>>>>>>>>>>>> that). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So - please split up the patch. It's way too big. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2014-08-15 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect >>>>>>>>>>>>>>>> function clone. >>>>>>>>>>>>>>>> (flag_force_vectorize): New variable. >>>>>>>>>>>>>>>> (edge_predicate): New function. >>>>>>>>>>>>>>>> (set_edge_predicate): New function. >>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function. >>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate >>>>>>>>>>>>>>>> field. >>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE. >>>>>>>>>>>>>>>> (convert_name_to_cmp): New function. >>>>>>>>>>>>>>>> (get_type_for_cond): New function. >>>>>>>>>>>>>>>> (convert_bool_predicate): New function. >>>>>>>>>>>>>>>> (predicate_disjunction): New function. >>>>>>>>>>>>>>>> (predicate_conjunction): New function. >>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument. >>>>>>>>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> such bb exists; save it in static variable for further >>>>>>>>>>>>>>>> possible use. >>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is >>>>>>>>>>>>>>>> true. >>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument. >>>>>>>>>>>>>>>> Add early function exit if edge target block is always >>>>>>>>>>>>>>>> executed. >>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is >>>>>>>>>>>>>>>> true. >>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list. >>>>>>>>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true. >>>>>>>>>>>>>>>> (equal_phi_args): New function. >>>>>>>>>>>>>>>> (phi_has_two_different_args): New function. >>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two >>>>>>>>>>>>>>>> args >>>>>>>>>>>>>>>> if flag_force_vectorize wa set-up. >>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on >>>>>>>>>>>>>>>> flag_force_vectorize. >>>>>>>>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if >>>>>>>>>>>>>>>> flag_force_vectorize was set-up. >>>>>>>>>>>>>>>> (all_edges_are_critical): New function. >>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors >>>>>>>>>>>>>>>> if >>>>>>>>>>>>>>>> flag_force_vectorize was set-up. Use call of >>>>>>>>>>>>>>>> all_edges_are_critical >>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges >>>>>>>>>>>>>>>> only if >>>>>>>>>>>>>>>> flag_force_vectorize was not set-up. >>>>>>>>>>>>>>>> (walk_cond_tree): New function. >>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function. >>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to >>>>>>>>>>>>>>>> transform >>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional >>>>>>>>>>>>>>>> expressions >>>>>>>>>>>>>>>> with integral operands. If convert_bool argument was set-up and >>>>>>>>>>>>>>>> vect bool pattern can be appied perform the following >>>>>>>>>>>>>>>> transformation: >>>>>>>>>>>>>>>> (bool) x != 0 --> y = (int) x; x != 0; >>>>>>>>>>>>>>>> Add check that if fold_build2 produces bool conversion if >>>>>>>>>>>>>>>> convert_bool >>>>>>>>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional >>>>>>>>>>>>>>>> argument >>>>>>>>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and >>>>>>>>>>>>>>>> add_to_predicate_list. >>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if >>>>>>>>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent >>>>>>>>>>>>>>>> bb's. >>>>>>>>>>>>>>>> Call predicate_bbs with additional argument equal to false. >>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of >>>>>>>>>>>>>>>> phi-block >>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical >>>>>>>>>>>>>>>> original >>>>>>>>>>>>>>>> algorithm is used. >>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which >>>>>>>>>>>>>>>> signals that >>>>>>>>>>>>>>>> phi arguments must be evaluated through >>>>>>>>>>>>>>>> phi_has_two_different_args. >>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp >>>>>>>>>>>>>>>> if cond >>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of >>>>>>>>>>>>>>>> is_cond_scalar_reduction. >>>>>>>>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>>>>>>>> (predicate_arbitrary_phi): New function. >>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple >>>>>>>>>>>>>>>> statement >>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for >>>>>>>>>>>>>>>> insertion. >>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated >>>>>>>>>>>>>>>> basic >>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. >>>>>>>>>>>>>>>> Insert >>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion. >>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended >>>>>>>>>>>>>>>> predication to build mask. >>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs. >>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from >>>>>>>>>>>>>>>> current >>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop >>>>>>>>>>>>>>>> versioning >>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener >>>>>>>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev >>>>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> We implemented additional support for pragma omp simd in >>>>>>>>>>>>>>>>>> part of >>>>>>>>>>>>>>>>>> extended if-conversion loops with such pragma. These >>>>>>>>>>>>>>>>>> extensions >>>>>>>>>>>>>>>>>> include: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 1. All extensions are performed only if considered loop or >>>>>>>>>>>>>>>>>> its outer >>>>>>>>>>>>>>>>>> loop was marked with pragma omp simd (force_vectorize); >>>>>>>>>>>>>>>>>> For ordinary >>>>>>>>>>>>>>>>>> loops behavior was not changed. >>>>>>>>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have >>>>>>>>>>>>>>>>>> more than 2 >>>>>>>>>>>>>>>>>> predecessors. >>>>>>>>>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed >>>>>>>>>>>>>>>>>> in current design: >>>>>>>>>>>>>>>>>> all phi nodes must be in non-predicated basic block to >>>>>>>>>>>>>>>>>> conform >>>>>>>>>>>>>>>>>> semantic of COND_EXPR which is used for transformation. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> How is that so? If the PHI is predicated then its result >>>>>>>>>>>>>>>>> will be used >>>>>>>>>>>>>>>>> in a PHI node again and thus we'd create a sequence of >>>>>>>>>>>>>>>>> COND_EXPRs. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> No? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 >>>>>>>>>>>>>>>>>> arguments >>>>>>>>>>>>>>>>>> with some limitations: >>>>>>>>>>>>>>>>>> - for phi nodes which have more than 2 arguments, but >>>>>>>>>>>>>>>>>> only two >>>>>>>>>>>>>>>>>> arguments are different and one of them has the only >>>>>>>>>>>>>>>>>> occurence, >>>>>>>>>>>>>>>>>> transformation to single COND_EXPR can be done. >>>>>>>>>>>>>>>>>> - if phi node has more different arguments and all edge >>>>>>>>>>>>>>>>>> predicates >>>>>>>>>>>>>>>>>> correspondent to phi-arguments are disjoint, a chain of >>>>>>>>>>>>>>>>>> COND_EXPR >>>>>>>>>>>>>>>>>> will be generated for it. In current design very simple >>>>>>>>>>>>>>>>>> check is used: >>>>>>>>>>>>>>>>>> check starting from end that two edges correspondent to >>>>>>>>>>>>>>>>>> neighbor >>>>>>>>>>>>>>>>>> arguments have common predecessor which is used for further >>>>>>>>>>>>>>>>>> check >>>>>>>>>>>>>>>>>> with next edge. >>>>>>>>>>>>>>>>>> These guarantee that phi predication will produce the >>>>>>>>>>>>>>>>>> correct result. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI >>>>>>>>>>>>>>>>> node by >>>>>>>>>>>>>>>>> inserting forwarder blocks. Thus >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> x = PHI <1(2), 1(3), 2(4)> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> becomes >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> bb 5: <forwarder-from(2)-and(3)> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> x = PHI <1(5), 2(4)> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> x = PHI <1(2), 2(3), 3(4)> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> becomes >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> bb 5: >>>>>>>>>>>>>>>>> x' = PHI <1(2), 2(3)> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> b = PHI<x'(5), 3(4)> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> which means that 3) has to work. Note that we want this kind >>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of >>>>>>>>>>>>>>>>> copies we need to insert on edges. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a >>>>>>>>>>>>>>>>> pre-pass >>>>>>>>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG >>>>>>>>>>>>>>>>> transform. >>>>>>>>>>>>>>>>> And make 3) work properly if it doesn't already. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> It looks like you introduce a "negate predicate" to work >>>>>>>>>>>>>>>>> around the >>>>>>>>>>>>>>>>> critical edge limitation? Please instead change >>>>>>>>>>>>>>>>> if-conversion to >>>>>>>>>>>>>>>>> work with edge predicates (as opposed to BB predicates). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Here is example of such extended predication (compile with >>>>>>>>>>>>>>>>>> -march=core-avx2): >>>>>>>>>>>>>>>>>> #pragma omp simd safelen(8) >>>>>>>>>>>>>>>>>> for (i=0; i<512; i++) >>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>> float t = a[i]; >>>>>>>>>>>>>>>>>> if (t > 0 & t < 1.0e+17f) >>>>>>>>>>>>>>>>>> if (c[i] != 0) >>>>>>>>>>>>>>>>>> res += 1; >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> <bb 4>: >>>>>>>>>>>>>>>>>> # res_15 = PHI <res_1(5), 0(3)> >>>>>>>>>>>>>>>>>> # i_16 = PHI <i_11(5), 0(3)> >>>>>>>>>>>>>>>>>> # ivtmp_17 = PHI <ivtmp_14(5), 512(3)> >>>>>>>>>>>>>>>>>> t_5 = a[i_16]; >>>>>>>>>>>>>>>>>> _6 = t_5 > 0.0; >>>>>>>>>>>>>>>>>> _7 = t_5 < 9.9999998430674944e+16; >>>>>>>>>>>>>>>>>> _8 = _7 & _6; >>>>>>>>>>>>>>>>>> _ifc__28 = (unsigned int) _8; >>>>>>>>>>>>>>>>>> _10 = &c[i_16]; >>>>>>>>>>>>>>>>>> _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0; >>>>>>>>>>>>>>>>>> _9 = MASK_LOAD (_10, 0B, _ifc__36); >>>>>>>>>>>>>>>>>> _ifc__29 = _ifc__28 != 0 ? 1 : 0; >>>>>>>>>>>>>>>>>> _ifc__30 = (int) _ifc__29; >>>>>>>>>>>>>>>>>> _ifc__31 = _9 != 0 ? _ifc__30 : 0; >>>>>>>>>>>>>>>>>> _ifc__32 = _ifc__28 != 0 ? 1 : 0; >>>>>>>>>>>>>>>>>> _ifc__33 = (int) _ifc__32; >>>>>>>>>>>>>>>>>> _ifc__34 = _9 == 0 ? _ifc__33 : 0; >>>>>>>>>>>>>>>>>> _ifc__35 = _ifc__31 != 0 ? 1 : 0; >>>>>>>>>>>>>>>>>> res_1 = res_15 + _ifc__35; >>>>>>>>>>>>>>>>>> i_11 = i_16 + 1; >>>>>>>>>>>>>>>>>> ivtmp_14 = ivtmp_17 - 1; >>>>>>>>>>>>>>>>>> if (ivtmp_14 != 0) >>>>>>>>>>>>>>>>>> goto <bb 4>; >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new >>>>>>>>>>>>>>>>>> failures. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> gcc/ChageLog >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 2014-06-25 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable. >>>>>>>>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field. >>>>>>>>>>>>>>>>>> (bb_negate_predicate): New function. >>>>>>>>>>>>>>>>>> (set_bb_negate_predicate): New function. >>>>>>>>>>>>>>>>>> (bb_copy_predicate): New function. >>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function. >>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate >>>>>>>>>>>>>>>>>> field. >>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE. >>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function. >>>>>>>>>>>>>>>>>> (get_type_for_cond): New function. >>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function. >>>>>>>>>>>>>>>>>> (predicate_disjunction): New function. >>>>>>>>>>>>>>>>>> (predicate_conjunction): New function. >>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument. >>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument >>>>>>>>>>>>>>>>>> is true. >>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument. >>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always >>>>>>>>>>>>>>>>>> executed. >>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument >>>>>>>>>>>>>>>>>> is true. >>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list. >>>>>>>>>>>>>>>>>> (equal_phi_args): New function. >>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function. >>>>>>>>>>>>>>>>>> (phi_args_disjoint): New function. >>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two >>>>>>>>>>>>>>>>>> args >>>>>>>>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi >>>>>>>>>>>>>>>>>> nodes are >>>>>>>>>>>>>>>>>> in non-predicated basic blocks. >>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize. >>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function. >>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two >>>>>>>>>>>>>>>>>> predecessors if >>>>>>>>>>>>>>>>>> flag_force_vectorize was setup. Use call of >>>>>>>>>>>>>>>>>> all_edges_are_critical >>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges >>>>>>>>>>>>>>>>>> only if >>>>>>>>>>>>>>>>>> flag_force_vectorize was not setup. >>>>>>>>>>>>>>>>>> (walk_cond_tree): New function. >>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function. >>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to >>>>>>>>>>>>>>>>>> transform >>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional >>>>>>>>>>>>>>>>>> expressions >>>>>>>>>>>>>>>>>> with integral operands. If bool_conv argument is false or >>>>>>>>>>>>>>>>>> both >>>>>>>>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate >>>>>>>>>>>>>>>>>> assignments >>>>>>>>>>>>>>>>>> is used, otherwise the following code was added: check on >>>>>>>>>>>>>>>>>> applicable >>>>>>>>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of >>>>>>>>>>>>>>>>>> (bool) x != 0 --> y = (int) x; x != 0; >>>>>>>>>>>>>>>>>> compute predicates for both outgoing edges one of which is >>>>>>>>>>>>>>>>>> critical >>>>>>>>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false >>>>>>>>>>>>>>>>>> predicates using >>>>>>>>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in >>>>>>>>>>>>>>>>>> predicate and negate_predicate fields of struct >>>>>>>>>>>>>>>>>> bb_predicate_s and >>>>>>>>>>>>>>>>>> negate_predicate of normal edge conatins predicate of >>>>>>>>>>>>>>>>>> critical edge, >>>>>>>>>>>>>>>>>> but generated gimplified statements are stored in their >>>>>>>>>>>>>>>>>> destination >>>>>>>>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to >>>>>>>>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list. >>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with >>>>>>>>>>>>>>>>>> additional argument >>>>>>>>>>>>>>>>>> equal to false. >>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of >>>>>>>>>>>>>>>>>> phi-block >>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical >>>>>>>>>>>>>>>>>> original >>>>>>>>>>>>>>>>>> algorithm is used. >>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which >>>>>>>>>>>>>>>>>> signals that >>>>>>>>>>>>>>>>>> both phi arguments must be evaluated through >>>>>>>>>>>>>>>>>> phi_has_two_different_args. >>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of >>>>>>>>>>>>>>>>>> convert_name_to_cmp if cond >>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of >>>>>>>>>>>>>>>>>> is_cond_scalar_reduction. >>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>>>>>>>>>> (predicate_phi_disjoint_args): New function. >>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple >>>>>>>>>>>>>>>>>> statement >>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for >>>>>>>>>>>>>>>>>> insertion. >>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated >>>>>>>>>>>>>>>>>> basic >>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. >>>>>>>>>>>>>>>>>> Insert >>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion. >>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for >>>>>>>>>>>>>>>>>> extended >>>>>>>>>>>>>>>>>> predication to build mask. >>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs. >>>>>>>>>>>>>>>>>> (split_crit_edge): New function. >>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from >>>>>>>>>>>>>>>>>> current >>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke >>>>>>>>>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning >>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>> innermost loop marked with pragma omp simd.