On Tue, Oct 21, 2014 at 4:09 PM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>> Richard,
>>
>> I saw the sources of these functions, but I can't understand why I
>> should use something else? Note that all predicate computations are
>> located in basic blocks ( by design of if-conv) and there is special
>> function that put these computations in bb
>> (insert_gimplified_predicates). Edge contains only predicate not its
>> computations. New function - find_insertion_point() does very simple
>> search - it finds out the latest (in current bb) operand def-stmt of
>> predicates taken from all incoming edges.
>> In original algorithm the predicate of non-critical edge is taken to
>> perform phi-node predication since for critical edge it does not work
>> properly.
>>
>> My question is: does your comments mean that I should re-design my 
>> extensions?
>
> Well, we have infrastructure for inserting code on edges and you've
> made critical edges predicated correctly.  So why re-invent the wheel?
> I realize this is very similar to my initial suggestion to simply split
> critical edges in loops you want to if-convert but delays splitting
> until it turns out to be necessary (which might be good for the
> !force_vect case).
>
> For edge predicates you simply can emit their computation on the
> edge, no?
>
> Btw, I very originally suggested to rework if-conversion to only
> record edge predicates - having both block and edge predicates
> somewhat complicates the code and makes it harder to
> maintain (thus also the suggestion to simply split critical edges
> if necessary to make BB predicates work always).
>
> Your patches add a lot of code and to me it seems we can avoid
> doing so much special casing.

For example attacking the critical edge issue by a simple

Index: tree-if-conv.c
===================================================================
--- tree-if-conv.c      (revision 216508)
+++ tree-if-conv.c      (working copy)
@@ -980,11 +980,7 @@ if_convertible_bb_p (struct loop *loop,
        if (EDGE_COUNT (e->src->succs) == 1)
          found = true;
       if (!found)
-       {
-         if (dump_file && (dump_flags & TDF_DETAILS))
-           fprintf (dump_file, "only critical predecessors\n");
-         return false;
-       }
+       split_edge (EDGE_PRED (bb, 0));
     }

   return true;

it changes the number of blocks in the loop, so
get_loop_body_in_if_conv_order should probably be re-done with the
above eventually signalling that it created a new block.  Or the above
should populate a vector of edges to split and do that after the
loop calling if_convertible_bb_p.

Richard.

> Richard.
>
>> Thanks.
>> Yuri.
>>
>> BTW Jeff did initial review of my changes related to predicate
>> computation for join blocks. I presented him updated patch with
>> test-case and some minor changes in patch. But still did not get any
>> feedback on it. Could you please take a look also on it?
>>
>>
>> 2014-10-21 17:38 GMT+04:00 Richard Biener <richard.guent...@gmail.com>:
>>> On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> Yes, This patch does not make sense since phi node predication for bb
>>>> with critical incoming edges only performs another function which is
>>>> absent (predicate_extended_scalar_phi).
>>>>
>>>> BTW I see that commit_edge_insertions() is used for rtx instructions
>>>> only but you propose to use it for tree also.
>>>> Did I miss something?
>>>
>>> Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert
>>> if you want easy access to the newly created basic block to push
>>> the predicate to - see gsi_commit_edge_inserts implementation).
>>>
>>> Richard.
>>>
>>>> Thanks ahead.
>>>>
>>>>
>>>> 2014-10-21 16:44 GMT+04:00 Richard Biener <richard.guent...@gmail.com>:
>>>>> On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> I did some changes in patch and ChangeLog to mark that support for
>>>>>> if-convert of blocks with only critical incoming edges will be added
>>>>>> in the future (more precise in patch.4).
>>>>>
>>>>> But the same reasoning applies to this version of the patch when
>>>>> flag_force_vectorize is true!?  (insertion point and invalid SSA form)
>>>>>
>>>>> Which means the patch doesn't make sense in isolation?
>>>>>
>>>>> Btw, I think for the case you should simply do gsi_insert_on_edge ()
>>>>> and commit_edge_insertions () before the call to combine_blocks
>>>>> (pushing the edge predicate to the newly created block).
>>>>>
>>>>> Richard.
>>>>>
>>>>>> Could you please review it.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> ChangeLog:
>>>>>>
>>>>>> 2014-10-21  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>
>>>>>> (flag_force_vectorize): New variable.
>>>>>> (edge_predicate): New function.
>>>>>> (set_edge_predicate): New function.
>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>> for critical edge.
>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>> (all_preds_critical_p): New function.
>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>> to reject temporarily block if-conversion with incoming critical edges
>>>>>> if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted
>>>>>> after adding support for extended predication.
>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>> Add zeroing of edge 'aux' field.
>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>> is equal 2 and at least one incoming edge is not critical original
>>>>>> algorithm is used.
>>>>>> (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false.
>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>
>>>>>> 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev <ysrum...@gmail.com>:
>>>>>>> Richard,
>>>>>>>
>>>>>>> Thanks for your answer!
>>>>>>>
>>>>>>> In current implementation phi node conversion assume that one of
>>>>>>> incoming edge to bb containing given phi has at least one non-critical
>>>>>>> edge and choose it to insert predicated code. But if we choose
>>>>>>> critical edge we need to determine insert point and insertion
>>>>>>> direction (before/after) since in other case we can get invalid ssa
>>>>>>> form (use before def). This is done by my new function which is not in
>>>>>>> current patch ( I will present this patch later). SO I assume that we
>>>>>>> need to leave this patch as it is to not introduce new bugs.
>>>>>>>
>>>>>>> Thanks.
>>>>>>> Yuri.
>>>>>>>
>>>>>>> 2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guent...@gmail.com>:
>>>>>>>> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>>>> wrote:
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> I reworked the patch as you proposed, but I didn't understand what
>>>>>>>>> did you mean by:
>>>>>>>>>
>>>>>>>>>>So please rework the patch so critical edges are always handled
>>>>>>>>>>correctly.
>>>>>>>>>
>>>>>>>>> In current patch flag_force_vectorize is used (1) to reject phi nodes
>>>>>>>>> with more than 2 arguments; (2) to reject basic blocks with only
>>>>>>>>> critical incoming edges since support for extended predication of phi
>>>>>>>>> nodes will be in next patch.
>>>>>>>>
>>>>>>>> I mean that (2) should not be rejected dependent on 
>>>>>>>> flag_force_vectorize.
>>>>>>>> It was rejected because if-cvt couldn't handle it correctly before but 
>>>>>>>> with
>>>>>>>> this patch this is fixed.  I see no reason to still reject this then 
>>>>>>>> even
>>>>>>>> for !flag_force_vectorize.
>>>>>>>>
>>>>>>>> Rejecting PHIs with more than two arguments with flag_force_vectorize
>>>>>>>> is ok.
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>> Could you please clarify your statement.
>>>>>>>>>
>>>>>>>>> I attached modified patch.
>>>>>>>>>
>>>>>>>>> ChangeLog:
>>>>>>>>>
>>>>>>>>> 2014-10-17  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>>>
>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>> (edge_predicate): New function.
>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke 
>>>>>>>>> add_to_predicate_list
>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>> for critical edge.
>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>> algorithm is used.
>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-10-17 13:09 GMT+04:00 Richard Biener 
>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev 
>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>> Richard,
>>>>>>>>>>>
>>>>>>>>>>> Here is reduced patch as you requested. All your remarks have been 
>>>>>>>>>>> fixed.
>>>>>>>>>>> Could you please look at it ( I have already sent the patch with
>>>>>>>>>>> changes in add_to_predicate_list for review).
>>>>>>>>>>
>>>>>>>>>> +             if (dump_file && (dump_flags & TDF_DETAILS))
>>>>>>>>>> +               fprintf (dump_file, "More than two phi node 
>>>>>>>>>> args.\n");
>>>>>>>>>> +             return false;
>>>>>>>>>> +           }
>>>>>>>>>> +
>>>>>>>>>> +        }
>>>>>>>>>>
>>>>>>>>>> Excess vertical space.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> +/* Assumes that BB has more than 2 predecessors.
>>>>>>>>>>
>>>>>>>>>> More than 1 predecessor?
>>>>>>>>>>
>>>>>>>>>> +   Returns false if at least one successor is not on critical edge
>>>>>>>>>> +   and true otherwise.  */
>>>>>>>>>> +
>>>>>>>>>> +static inline bool
>>>>>>>>>> +all_edges_are_critical (basic_block bb)
>>>>>>>>>> +{
>>>>>>>>>>
>>>>>>>>>> "all_preds_critical_p" would be a better name
>>>>>>>>>>
>>>>>>>>>> +  if (EDGE_COUNT (bb->preds) > 2)
>>>>>>>>>> +    {
>>>>>>>>>> +      if (!flag_force_vectorize)
>>>>>>>>>> +       return false;
>>>>>>>>>> +    }
>>>>>>>>>>
>>>>>>>>>> as I said in the last review I don't think we should restrict edge
>>>>>>>>>> predicates to flag_force_vectorize.  At least I can't see how
>>>>>>>>>> if-conversion is magically more expensive for that case?
>>>>>>>>>>
>>>>>>>>>> So please rework the patch so critical edges are always handled
>>>>>>>>>> correctly.
>>>>>>>>>>
>>>>>>>>>> Ok with that and the above suggested changes.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>> Yuri.
>>>>>>>>>>> ChangeLog
>>>>>>>>>>> 2014-10-16  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke 
>>>>>>>>>>> add_to_predicate_list
>>>>>>>>>>> if destination block of edge is not always executed. Set-up 
>>>>>>>>>>> predicate
>>>>>>>>>>> for critical edge.
>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>> extended phi node predication. If number of predecessors of 
>>>>>>>>>>> phi-block
>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>> algorithm is used.
>>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to 
>>>>>>>>>>> false.
>>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener 
>>>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev 
>>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is updated patch (part1) for extended if conversion.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Second part of patch will be sent later.
>>>>>>>>>>>>
>>>>>>>>>>>> Ok, I'm starting to look at this.  I'd still like you to split 
>>>>>>>>>>>> things up
>>>>>>>>>>>> more.
>>>>>>>>>>>>
>>>>>>>>>>>>  static inline void
>>>>>>>>>>>>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>>>>>>>>>>>>  {
>>>>>>>>>>>> ...
>>>>>>>>>>>>
>>>>>>>>>>>> +      /* We use notion of cd equivalence to get simplier 
>>>>>>>>>>>> predicate for
>>>>>>>>>>>> +        join block, e.g. if join block has 2 predecessors with 
>>>>>>>>>>>> predicates
>>>>>>>>>>>> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead 
>>>>>>>>>>>> of
>>>>>>>>>>>> +        p1 & p2 | p1 & !p2.  */
>>>>>>>>>>>> +      if (dom_bb != loop->header
>>>>>>>>>>>> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) 
>>>>>>>>>>>> == bb)
>>>>>>>>>>>> +       {
>>>>>>>>>>>> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
>>>>>>>>>>>> +         bc = bb_predicate (dom_bb);
>>>>>>>>>>>> +         gcc_assert (!is_true_predicate (bc));
>>>>>>>>>>>>
>>>>>>>>>>>> these changes look worthwhile even for !flag_force_vectorize.  So 
>>>>>>>>>>>> please
>>>>>>>>>>>> split the change to add_to_predicate_list out and compute 
>>>>>>>>>>>> post-dominators
>>>>>>>>>>>> unconditionally.  Note that you should call free_dominance_info
>>>>>>>>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion.
>>>>>>>>>>>>
>>>>>>>>>>>> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
>>>>>>>>>>>> +    add_to_predicate_list (loop, e->dest, cond);
>>>>>>>>>>>> +
>>>>>>>>>>>> +  /* If edge E is critical save predicate on it.  */
>>>>>>>>>>>> +  if (EDGE_COUNT (e->dest->preds) >= 2)
>>>>>>>>>>>> +    set_edge_predicate (e, cond);
>>>>>>>>>>>>
>>>>>>>>>>>> how do we know the edge is critical by this simple check?  Why not
>>>>>>>>>>>> simply always save edge predicates (well, you kind of do but omit
>>>>>>>>>>>> the case where e->src dominates e->dest).
>>>>>>>>>>>>
>>>>>>>>>>>> Btw, you can rely on edge->aux being NULL at the start of the
>>>>>>>>>>>> pass but need to clear it at the end (best use clear_aux_for_edges 
>>>>>>>>>>>> ()
>>>>>>>>>>>> for that).  So stuff like
>>>>>>>>>>>>
>>>>>>>>>>>> +         extract_true_false_edges_from_block (bb, &true_edge, 
>>>>>>>>>>>> &false_edge);
>>>>>>>>>>>> +         if (flag_force_vectorize)
>>>>>>>>>>>> +           true_edge->aux = false_edge->aux = NULL;
>>>>>>>>>>>>
>>>>>>>>>>>> shouldn't be necessary.
>>>>>>>>>>>>
>>>>>>>>>>>> I think the edge predicate handling should also be unconditionally
>>>>>>>>>>>> and not depend on flag_force_vectorize.
>>>>>>>>>>>>
>>>>>>>>>>>> +      /* The loop latch and loop exit block are always executed 
>>>>>>>>>>>> and
>>>>>>>>>>>> +        have no extra conditions to be processed: skip them.  */
>>>>>>>>>>>> +      if (bb == loop->latch
>>>>>>>>>>>> +         || bb_with_exit_edge_p (loop, bb))
>>>>>>>>>>>>
>>>>>>>>>>>> I don't think the edge stuff is true - given you still only reset 
>>>>>>>>>>>> the
>>>>>>>>>>>> loop->latch bb predicate the change looks broken.
>>>>>>>>>>>>
>>>>>>>>>>>> +         /* Fold_build2 can produce bool conversion which is not
>>>>>>>>>>>> +             supported by vectorizer, so re-build it without 
>>>>>>>>>>>> folding.
>>>>>>>>>>>> +            For example, such conversion is generated for 
>>>>>>>>>>>> sequence:
>>>>>>>>>>>> +               _Bool _7, _8, _9;
>>>>>>>>>>>> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
>>>>>>>>>>>> +               if (_9 != 0)  --> (bool)_9.  */
>>>>>>>>>>>> +
>>>>>>>>>>>> +         if (CONVERT_EXPR_P (c)
>>>>>>>>>>>> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>>>>>>>>>>>>
>>>>>>>>>>>> I think you should simply use canonicalize_cond_expr_cond on the
>>>>>>>>>>>> folding result.  Or rather _not_ fold at all - we are taking the
>>>>>>>>>>>> operands from the GIMPLE condition unmodified after all.
>>>>>>>>>>>>
>>>>>>>>>>>> -         add_to_dst_predicate_list (loop, false_edge,
>>>>>>>>>>>> -                                    unshare_expr (cond), c2);
>>>>>>>>>>>> +         add_to_dst_predicate_list (loop, false_edge, 
>>>>>>>>>>>> unshare_expr (cond),
>>>>>>>>>>>> +                                    unshare_expr (c2));
>>>>>>>>>>>>
>>>>>>>>>>>> why is it necessary to unshare c2?
>>>>>>>>>>>>
>>>>>>>>>>>> Please split out the PHI-with-multi-arg handling  (I have not 
>>>>>>>>>>>> looked at
>>>>>>>>>>>> that in detail).
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Changelog.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-10-13  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function 
>>>>>>>>>>>>> clone.
>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of 
>>>>>>>>>>>>> all_edges_are_critical
>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under 
>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>> extended phi node predication. If number of predecessors of 
>>>>>>>>>>>>> phi-block
>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop 
>>>>>>>>>>>>> versioning
>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of 
>>>>>>>>>>>>> edges
>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrum...@gmail.com>:
>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> here is reduced patch (part.1) which was reduced almost twice.
>>>>>>>>>>>>>> Let's me also answer on your comments.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1. I really use edge field 'aux' to keep predicate for critical 
>>>>>>>>>>>>>> edges.
>>>>>>>>>>>>>> My previous code was not correct and now it looks like:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) 
>>>>>>>>>>>>>> == 1)
>>>>>>>>>>>>>>     /* Edge E is not critical,  use predicate of edge source bb. 
>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>     c = bb_predicate (b);
>>>>>>>>>>>>>>   else
>>>>>>>>>>>>>>     /* Edge E is critical and its aux field contains predicate.  
>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>     c = edge_predicate (e);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2. I completely delete all code related to creation of 
>>>>>>>>>>>>>> conditional
>>>>>>>>>>>>>> expressions and completely rely on bool pattern recognition in
>>>>>>>>>>>>>> vectorizer. But we need to delete all dead predicate computations
>>>>>>>>>>>>>> which are not used since they prevent vectorization. I will add 
>>>>>>>>>>>>>> this
>>>>>>>>>>>>>> local-dce function in next patch.
>>>>>>>>>>>>>> 3. I also did not include in this patch recognition of general
>>>>>>>>>>>>>> phi-nodes with two arguments only for which conversion of 
>>>>>>>>>>>>>> conditional
>>>>>>>>>>>>>> scalar reduction can be applied also.
>>>>>>>>>>>>>> Note that all these changes are applied for loop marked with 
>>>>>>>>>>>>>> pragma
>>>>>>>>>>>>>> omp simd only.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-09-22  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function 
>>>>>>>>>>>>>> clone.
>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up 
>>>>>>>>>>>>>> predicate
>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of 
>>>>>>>>>>>>>> all_edges_are_critical
>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only 
>>>>>>>>>>>>>> if
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under 
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of 
>>>>>>>>>>>>>> phi-block
>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from 
>>>>>>>>>>>>>> current
>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop 
>>>>>>>>>>>>>> versioning
>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field 
>>>>>>>>>>>>>> of edges
>>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener 
>>>>>>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev 
>>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>>>>>> Richard!
>>>>>>>>>>>>>>>> Here is updated patch with the following changes:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1. Any restrictions on phi-function were eliminated for 
>>>>>>>>>>>>>>>> extended conversion.
>>>>>>>>>>>>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, 
>>>>>>>>>>>>>>>> i.e.
>>>>>>>>>>>>>>>> negate_predicate was deleted.
>>>>>>>>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing 
>>>>>>>>>>>>>>>> edges can
>>>>>>>>>>>>>>>> be critical.
>>>>>>>>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join 
>>>>>>>>>>>>>>>> basic
>>>>>>>>>>>>>>>> blocks to simplify it.
>>>>>>>>>>>>>>>> 5. I decided to not design pre-pass since it will lead 
>>>>>>>>>>>>>>>> generating
>>>>>>>>>>>>>>>> chain of cond expressions for phi-node if conversion, whereas 
>>>>>>>>>>>>>>>> for phi
>>>>>>>>>>>>>>>> of kind
>>>>>>>>>>>>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>> only one cond expression is required and this is considered as 
>>>>>>>>>>>>>>>> simple
>>>>>>>>>>>>>>>> optimization for arbitrary phi-function. More precise,
>>>>>>>>>>>>>>>> if phi-function have only two different arguments and one of 
>>>>>>>>>>>>>>>> them has
>>>>>>>>>>>>>>>> single occurrence, if- conversion is performed as if phi have 
>>>>>>>>>>>>>>>> only 2
>>>>>>>>>>>>>>>> arguments.
>>>>>>>>>>>>>>>> For arbitrary phi function a chain of cond expressions is 
>>>>>>>>>>>>>>>> produced.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Updated patch is attached.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Any comments will be appreciated.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The patch is still very big and does multiple things at once 
>>>>>>>>>>>>>>> which makes
>>>>>>>>>>>>>>> it hard to review.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In addition to that it changes function singatures without 
>>>>>>>>>>>>>>> updating
>>>>>>>>>>>>>>> the function comments.  For example what is the convert_bool
>>>>>>>>>>>>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>>>>>>>>>>>>> all this added logic.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think the code handling PHIs with more than two operands but
>>>>>>>>>>>>>>> only two unequal operands is useful generally, so that's an 
>>>>>>>>>>>>>>> obvious
>>>>>>>>>>>>>>> candidate for splitting out into a separate patch.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +   CONVERT_BOOL argument was added to convert bool predicate 
>>>>>>>>>>>>>>> computations
>>>>>>>>>>>>>>> +   which is not supported by vectorizer to int type through 
>>>>>>>>>>>>>>> creating of
>>>>>>>>>>>>>>> +   conditional expressions.  */
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Example?  The vectorizer has patterns for bool predicate 
>>>>>>>>>>>>>>> computations.
>>>>>>>>>>>>>>> This seems to be another feature that needs splitting out.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The way you get around the critical edge parts looks awkward to 
>>>>>>>>>>>>>>> me.
>>>>>>>>>>>>>>> Please either do _all_ predicates as edge predicates or simply
>>>>>>>>>>>>>>> split critical edges (of the respective loop body).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I still think that an utility doing same PHI arg merging by 
>>>>>>>>>>>>>>> introducing
>>>>>>>>>>>>>>> forwarder blocks would be nicer to have.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'd restructure the main tree_if_conversion function to apply 
>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>> CFG pre-transforms when we are going to version the loop
>>>>>>>>>>>>>>> for if conversion (eventually transitioning to always doing 
>>>>>>>>>>>>>>> that).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So - please split up the patch.  It's way too big.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2014-08-15  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect 
>>>>>>>>>>>>>>>> function clone.
>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate 
>>>>>>>>>>>>>>>> field.
>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true 
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> such bb exists; save it in static variable for further 
>>>>>>>>>>>>>>>> possible use.
>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is 
>>>>>>>>>>>>>>>> true.
>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>> Add early function exit if edge target block is always 
>>>>>>>>>>>>>>>> executed.
>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is 
>>>>>>>>>>>>>>>> true.
>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two 
>>>>>>>>>>>>>>>> args
>>>>>>>>>>>>>>>> if flag_force_vectorize wa set-up.
>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on 
>>>>>>>>>>>>>>>> flag_force_vectorize.
>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>>>>>>>>>>>>> flag_force_vectorize was set-up.
>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors 
>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>> flag_force_vectorize was set-up. Use call of 
>>>>>>>>>>>>>>>> all_edges_are_critical
>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges 
>>>>>>>>>>>>>>>> only if
>>>>>>>>>>>>>>>> flag_force_vectorize was not set-up.
>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to 
>>>>>>>>>>>>>>>> transform
>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional 
>>>>>>>>>>>>>>>> expressions
>>>>>>>>>>>>>>>> with integral operands. If convert_bool argument was set-up and
>>>>>>>>>>>>>>>> vect bool pattern can be appied perform the following 
>>>>>>>>>>>>>>>> transformation:
>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>> Add check that if fold_build2 produces bool conversion if 
>>>>>>>>>>>>>>>> convert_bool
>>>>>>>>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional 
>>>>>>>>>>>>>>>> argument
>>>>>>>>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>>>>>>>>>>>>> add_to_predicate_list.
>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent 
>>>>>>>>>>>>>>>> bb's.
>>>>>>>>>>>>>>>> Call predicate_bbs with additional argument equal to false.
>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of 
>>>>>>>>>>>>>>>> phi-block
>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical 
>>>>>>>>>>>>>>>> original
>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which 
>>>>>>>>>>>>>>>> signals that
>>>>>>>>>>>>>>>> phi arguments must be evaluated through 
>>>>>>>>>>>>>>>> phi_has_two_different_args.
>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp 
>>>>>>>>>>>>>>>> if cond
>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of 
>>>>>>>>>>>>>>>> is_cond_scalar_reduction.
>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>> (predicate_arbitrary_phi): New function.
>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple 
>>>>>>>>>>>>>>>> statement
>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for 
>>>>>>>>>>>>>>>> insertion.
>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated 
>>>>>>>>>>>>>>>> basic
>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. 
>>>>>>>>>>>>>>>> Insert
>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from 
>>>>>>>>>>>>>>>> current
>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop 
>>>>>>>>>>>>>>>> versioning
>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener 
>>>>>>>>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev 
>>>>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We implemented additional support for pragma omp simd in 
>>>>>>>>>>>>>>>>>> part of
>>>>>>>>>>>>>>>>>> extended if-conversion loops with such pragma. These 
>>>>>>>>>>>>>>>>>> extensions
>>>>>>>>>>>>>>>>>> include:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1. All extensions are performed only if considered loop or 
>>>>>>>>>>>>>>>>>> its outer
>>>>>>>>>>>>>>>>>>    loop was marked with pragma omp simd (force_vectorize); 
>>>>>>>>>>>>>>>>>> For ordinary
>>>>>>>>>>>>>>>>>>    loops behavior was not changed.
>>>>>>>>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have 
>>>>>>>>>>>>>>>>>> more than 2
>>>>>>>>>>>>>>>>>>    predecessors.
>>>>>>>>>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed 
>>>>>>>>>>>>>>>>>> in current design:
>>>>>>>>>>>>>>>>>>    all phi nodes must be in non-predicated basic block to 
>>>>>>>>>>>>>>>>>> conform
>>>>>>>>>>>>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> How is that so?  If the PHI is predicated then its result 
>>>>>>>>>>>>>>>>> will be used
>>>>>>>>>>>>>>>>> in a PHI node again and thus we'd create a sequence of 
>>>>>>>>>>>>>>>>> COND_EXPRs.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> No?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 
>>>>>>>>>>>>>>>>>> arguments
>>>>>>>>>>>>>>>>>> with some limitations:
>>>>>>>>>>>>>>>>>>    - for phi nodes which have more than 2 arguments, but 
>>>>>>>>>>>>>>>>>> only two
>>>>>>>>>>>>>>>>>>    arguments are different and one of them has the only 
>>>>>>>>>>>>>>>>>> occurence,
>>>>>>>>>>>>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>>>>>>>>>>>>    - if phi node has more different arguments and all edge 
>>>>>>>>>>>>>>>>>> predicates
>>>>>>>>>>>>>>>>>>    correspondent to phi-arguments are disjoint, a chain of 
>>>>>>>>>>>>>>>>>> COND_EXPR
>>>>>>>>>>>>>>>>>>    will be generated for it. In current design very simple 
>>>>>>>>>>>>>>>>>> check is used:
>>>>>>>>>>>>>>>>>>    check starting from end that two edges correspondent to 
>>>>>>>>>>>>>>>>>> neighbor
>>>>>>>>>>>>>>>>>> arguments have common predecessor which is used for further 
>>>>>>>>>>>>>>>>>> check
>>>>>>>>>>>>>>>>>> with next edge.
>>>>>>>>>>>>>>>>>>  These guarantee that phi predication will produce the 
>>>>>>>>>>>>>>>>>> correct result.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI 
>>>>>>>>>>>>>>>>> node by
>>>>>>>>>>>>>>>>> inserting forwarder blocks.  Thus
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   bb 5:
>>>>>>>>>>>>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> which means that 3) has to work.  Note that we want this kind 
>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>>>>>>>>>>>>> copies we need to insert on edges.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a 
>>>>>>>>>>>>>>>>> pre-pass
>>>>>>>>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG 
>>>>>>>>>>>>>>>>> transform.
>>>>>>>>>>>>>>>>> And make 3) work properly if it doesn't already.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It looks like you introduce a "negate predicate" to work 
>>>>>>>>>>>>>>>>> around the
>>>>>>>>>>>>>>>>> critical edge limitation?  Please instead change 
>>>>>>>>>>>>>>>>> if-conversion to
>>>>>>>>>>>>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Here is example of such extended predication (compile with 
>>>>>>>>>>>>>>>>>> -march=core-avx2):
>>>>>>>>>>>>>>>>>> #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>>>>>>>>>>>>       if (c[i] != 0)
>>>>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>   <bb 4>:
>>>>>>>>>>>>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>>>>>>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>>>>>>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>>>>>>>>>>>>   t_5 = a[i_16];
>>>>>>>>>>>>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>>>>>>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>>>>>>>>>>>>   _8 = _7 & _6;
>>>>>>>>>>>>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>>>>>>>>>>>>   _10 = &c[i_16];
>>>>>>>>>>>>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>>>>>>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>>>>>>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>>>>>>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>>>>>>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>>>>>>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>>>>>>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>>>>>>>>>>>>   i_11 = i_16 + 1;
>>>>>>>>>>>>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>>>>>>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>>>>>>>>>>>>     goto <bb 4>;
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new 
>>>>>>>>>>>>>>>>>> failures.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> gcc/ChageLog
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>>>>>>>>>>>>> (bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>> (bb_copy_predicate): New function.
>>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate 
>>>>>>>>>>>>>>>>>> field.
>>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument 
>>>>>>>>>>>>>>>>>> is true.
>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always 
>>>>>>>>>>>>>>>>>> executed.
>>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument 
>>>>>>>>>>>>>>>>>> is true.
>>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>> (phi_args_disjoint): New function.
>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two 
>>>>>>>>>>>>>>>>>> args
>>>>>>>>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi 
>>>>>>>>>>>>>>>>>> nodes are
>>>>>>>>>>>>>>>>>> in non-predicated basic blocks.
>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two 
>>>>>>>>>>>>>>>>>> predecessors if
>>>>>>>>>>>>>>>>>> flag_force_vectorize was setup. Use call of 
>>>>>>>>>>>>>>>>>> all_edges_are_critical
>>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges 
>>>>>>>>>>>>>>>>>> only if
>>>>>>>>>>>>>>>>>> flag_force_vectorize was not setup.
>>>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to 
>>>>>>>>>>>>>>>>>> transform
>>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional 
>>>>>>>>>>>>>>>>>> expressions
>>>>>>>>>>>>>>>>>> with integral operands. If bool_conv argument is false or 
>>>>>>>>>>>>>>>>>> both
>>>>>>>>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate 
>>>>>>>>>>>>>>>>>> assignments
>>>>>>>>>>>>>>>>>> is used, otherwise the following code was added: check on 
>>>>>>>>>>>>>>>>>> applicable
>>>>>>>>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>>>> compute predicates for both outgoing edges one of which is 
>>>>>>>>>>>>>>>>>> critical
>>>>>>>>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false 
>>>>>>>>>>>>>>>>>> predicates using
>>>>>>>>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>>>>>>>>>>>>> predicate and negate_predicate fields of struct 
>>>>>>>>>>>>>>>>>> bb_predicate_s and
>>>>>>>>>>>>>>>>>> negate_predicate of normal edge conatins predicate of 
>>>>>>>>>>>>>>>>>> critical edge,
>>>>>>>>>>>>>>>>>> but generated gimplified statements are stored in their 
>>>>>>>>>>>>>>>>>> destination
>>>>>>>>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>>>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with 
>>>>>>>>>>>>>>>>>> additional argument
>>>>>>>>>>>>>>>>>> equal to false.
>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of 
>>>>>>>>>>>>>>>>>> phi-block
>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical 
>>>>>>>>>>>>>>>>>> original
>>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which 
>>>>>>>>>>>>>>>>>> signals that
>>>>>>>>>>>>>>>>>> both phi arguments must be evaluated through 
>>>>>>>>>>>>>>>>>> phi_has_two_different_args.
>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of 
>>>>>>>>>>>>>>>>>> convert_name_to_cmp if cond
>>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of 
>>>>>>>>>>>>>>>>>> is_cond_scalar_reduction.
>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple 
>>>>>>>>>>>>>>>>>> statement
>>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for 
>>>>>>>>>>>>>>>>>> insertion.
>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated 
>>>>>>>>>>>>>>>>>> basic
>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. 
>>>>>>>>>>>>>>>>>> Insert
>>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for 
>>>>>>>>>>>>>>>>>> extended
>>>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>>>> (split_crit_edge): New function.
>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from 
>>>>>>>>>>>>>>>>>> current
>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>>>>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning 
>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>> innermost loop marked with pragma omp simd.

Reply via email to