Richard,

I changed algorithm for bool pattern repair.
It turned out that ifcvt_local_dce phaase is required since for
test-case I sent you in previous mail vectorization is not performed
without dead code elimination:

For the loop
#pragma omp simd safelen(8)
  for (i=0; i<512; i++)
  {
    float t = a[i];
    if (t > 0.0f & t < 1.0e+17f)
      if (c[i] != 0)
res += 1;
  }

I've got the following message from vectorizer:

t3.c:10:11: note: ==> examining statement: _ifc__39 = t_5 > 0.0;

t3.c:10:11: note: bit-precision arithmetic not supported.
t3.c:10:11: note: not vectorized: relevant stmt not supported:
_ifc__39 = t_5 > 0.0;

It is caused by the following dead predicate computations after
critical edge splitting:

(after combine blocks):

<bb 3>:
# res_15 = PHI <res_1(7), 0(19)>
# i_16 = PHI <i_11(7), 0(19)>
# ivtmp_14 = PHI <ivtmp_13(7), 512(19)>
t_5 = a[i_16];
_6 = t_5 > 0.0;
_7 = t_5 < 9.9999998430674944e+16;
_8 = _6 & _7;
_10 = &c[i_16];
_ifc__36 = _8 ? 4294967295 : 0;
_9 = MASK_LOAD (_10, 0B, _ifc__36);
_28 = _8;
_29 = _9 != 0;
_30 = _28 & _29;
// Statements below are dead!!
_31 = _8;
_32 = _9 != 0;
_33 = ~_32;
_34 = _31 & _33;
// End of dead statements.
_ifc__35 = _30 ? 1 : 0;
res_1 = res_15 + _ifc__35;
i_11 = i_16 + 1;
ivtmp_13 = ivtmp_14 - 1;
if (ivtmp_13 != 0)
  goto <bb 7>;
else
  goto <bb 8>;

But if we delete these statements loop will be vectorized.

New patch is attached.

Thanks.
Yuri.

2014-12-19 14:45 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
> On Thu, Dec 18, 2014 at 2:45 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>> Richard,
>>
>> I am sending you full patch (~1000 lines) but if you need only patch.1
>> and patch.2 will let me know and i'll send you reduced patch.
>>
>> Below are few comments regarding your remarks for patch.3.
>>
>> 1. I deleted sub-phase ifcvt_local_dce since I did not find test-case
>> when dead code elimination is required to vectorize loop, i.e. dead
>> statement is marked as relevant.
>> 2. You wrote:
>>> The "retry" code also looks odd - why do you walk the BB multiple
>>> times instead of just doing sth like
>>>
>>>  while (!has_single_use (lhs))
>>>    {
>>>      gimple copy = ifcvt_split_def_stmt (def_stmt);
>>>      ifcvt_walk_pattern_tree (copy);
>>>    }
>>>
>>> thus returning the copy you create and re-process it (the copy should
>>> now have a single-use).
>>
>> The problem is that not only top SSA_NAME (lhs) may have multiple uses
>> but some intermediate variables too. For example, for the following
>> test-case
>>
>> float a[1000];
>> int c[1000];
>>
>> int foo()
>> {
>>   int i, res = 0;
>> #pragma omp simd safelen(8)
>>   for (i=0; i<512; i++)
>>   {
>>     float t = a[i];
>>     if (t > 0.0f & t < 1.0e+17f)
>>       if (c[i] != 0)
>> res += 1;
>>   }
>>   return res;
>> }
>>
>> After combine_blocks we have the following bb:
>>
>> <bb 3>:
>> # res_15 = PHI <res_1(7), 0(15)>
>> # i_16 = PHI <i_11(7), 0(15)>
>> # ivtmp_14 = PHI <ivtmp_13(7), 512(15)>
>> t_5 = a[i_16];
>> _6 = t_5 > 0.0;
>> _7 = t_5 < 9.9999998430674944e+16;
>> _8 = _6 & _7;
>> _10 = &c[i_16];
>> _ifc__32 = _8 ? 4294967295 : 0;
>> _9 = MASK_LOAD (_10, 0B, _ifc__32);
>> _28 = _8;
>> _29 = _9 != 0;
>> _30 = _28 & _29;
>> _ifc__31 = _30 ? 1 : 0;
>> res_1 = res_15 + _ifc__31;
>> i_11 = i_16 + 1;
>> ivtmp_13 = ivtmp_14 - 1;
>> if (ivtmp_13 != 0)
>>   goto <bb 7>;
>> else
>>   goto <bb 8>;
>>
>> and we can see that _8 has multiple uses. Also note that after splitting of
>> _8 = _6 & _7
>> we also get multiple uses for definition of  _6 and _7. So I used this
>> iterative algorithm as the simplest one.
>
> But it walks the entire pattern again and again while you only need to
> ensure you walk the pattern tree of the now single-use DEF again
> (in fact, rather than replacing a random USE in ifcvt_split_def_stmt
> you should pass down the user_operand_p that you need to make
> single-use).
>
>> I think it would be nice to re-use some utility from tree-vect-patterns.c
>> for stmt_is_root_of_bool_pattern.
>>
>> I assume that function stmt_is_root_of_bool_pattern can be simplified
>> to check on COND_EXPR only since PHI predication and memory access
>> predication produced only such statements,i.e. it can look like
>>
>> static bool
>> stmt_is_root_of_bool_pattern (gimple stmt, tree *var)
>> {
>>   enum tree_code code;
>>   tree lhs, rhs;
>>
>>   code = gimple_assign_rhs_code (stmt);
>>   if (code == COND_EXPR)
>>     {
>>       rhs = gimple_assign_rhs1 (stmt);
>>       if (TREE_CODE (rhs) != SSA_NAME)
>> return false;
>>       *var = rhs;
>>       return true;
>>     }
>>   return false;
>> }
>>
>> I also did few minor changes in patch.2.
>>
>> 3. You can also notice that I inserted code in tree_if_conversion to
>> do loop version if explicit option "-ftree-loop-if-convert" was not
>> passed to compiler, i.e. we perform if-conversion for loop
>> vectorization only and if it does not take place, we should delete
>> if-converted version of loop.
>> What is your opinion?
>
> Overall part 1 and part 2 look good to me, predicate_scalar_phi
> looks in need of some refactoring to avoid duplicate code.  We can
> do that a followup.
>
> Part 3 still needs the iteration to be resolved and make the use we
> actually care about single-use, not a random one so we can avoid
> iterating completely.
>
> Richard.
>
>> Thanks.
>> Yuri.
>>
>> 2014-12-17 18:41 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>> On Tue, Dec 16, 2014 at 4:15 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>>>> Hi Richard,
>>>>
>>>> Here is updated patch which includes
>>>> (1) split critical edges for aggressive if conversion.
>>>> (2) delete all stuff related to support of critical edge predication.
>>>> (3) only one function - predicate_scalar_phi performs predication.
>>>> (4) function find_phi_replacement_condition was deleted since it was
>>>> included in predicate_scalar_phi for phi with two arguments.
>>>>
>>>> I checked that patch works in stress testing mode, i.e. with
>>>> aggressive if conversion by default.
>>>>
>>>> What is your opinion?
>>>
>>> Looks ok overall, but please simply do
>>>
>>>   FOR_EACH_EDGE (e, ei, bb->succs)
>>>     if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop)
>>>       split_edge (e);
>>>
>>> for all blocks apart from the latch.
>>>
>>> Can you please send a combined patch up to this one?  Looking at
>>> the incremental diff is somewhat hard.  Thus a patch including all
>>> patches from patch1 to this one.
>>>
>>> Thanks,
>>> Richard.
>>>
>>>>
>>>> Thanks.
>>>> Yuri.
>>>>
>>>> 2014-12-11 11:59 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>>> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> Thanks for your reply!
>>>>>>
>>>>>> I didn't understand your point:
>>>>>>
>>>>>> Well, I don't mind splitting all critical edges unconditionally
>>>>>>
>>>>>> but you do it unconditionally in proposed patch.
>>>>>
>>>>> I don't mind means I am fine with it.
>>>>>
>>>>>> Also I assume that
>>>>>> call of split_critical_edges() can break ssa. For example, we can
>>>>>> split headers of loops, loop exit blocks etc.
>>>>>
>>>>> How does that "break SSA"?  You mean loop-closed SSA?  I'd
>>>>> be surprised if so but that may be possible.
>>>>>
>>>>>> I prefer to do something
>>>>>> more loop-specialized, e.g. call edge_split() for critical edges
>>>>>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge
>>>>>> destination bb belongs to loop).
>>>>>
>>>>> That works for me as well but it is more complicated to implement.
>>>>> Ideally you'd only split one edge if you find a block with only critical
>>>>> predecessors (where we'd currently give up).  But note that this
>>>>> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it
>>>>> will change loop->num_nodes so we have to be more careful in
>>>>> constructing the loop calling if_convertible_bb_p.
>>>>>
>>>>> Richard.
>>>>>
>>>>>>
>>>>>> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>>>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>>> wrote:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> Sorry that I forgot to delete debug dump from my fix.
>>>>>>>> I have few questions about your comments.
>>>>>>>>
>>>>>>>> 1. You wrote :
>>>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>>>> path
>>>>>>>>  Did you mean that I must combine predicate_scalar_phi and
>>>>>>>> predicate_extended scalar phi to one function?
>>>>>>>> Please note that if additional flag was not set up (i.e.
>>>>>>>> aggressive_if_conv is false) extended predication is required more
>>>>>>>> compile time since it builds hash_map.
>>>>>>>
>>>>>>> It's compile-time complexity is reasonable enough even for
>>>>>>> non-aggressive if-conversion.
>>>>>>>
>>>>>>>> 2. About critical edge splitting.
>>>>>>>>
>>>>>>>> Did you mean that we should perform it (1) under aggressive_if_conv
>>>>>>>> option only; (2) should we split all critical edges.
>>>>>>>> Note that this leads to recomputing of topological order.
>>>>>>>
>>>>>>> Well, I don't mind splitting all critical edges unconditionally, thus
>>>>>>> do something like
>>>>>>>
>>>>>>> Index: gcc/tree-if-conv.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/tree-if-conv.c  (revision 218515)
>>>>>>> +++ gcc/tree-if-conv.c  (working copy)
>>>>>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
>>>>>>>    if (number_of_loops (fun) <= 1)
>>>>>>>      return 0;
>>>>>>>
>>>>>>> +  bool critical_edges_split_p = false;
>>>>>>>    FOR_EACH_LOOP (loop, 0)
>>>>>>>      if (flag_tree_loop_if_convert == 1
>>>>>>>         || flag_tree_loop_if_convert_stores == 1
>>>>>>>         || ((flag_tree_loop_vectorize || loop->force_vectorize)
>>>>>>>             && !loop->dont_vectorize))
>>>>>>> -      todo |= tree_if_conversion (loop);
>>>>>>> +      {
>>>>>>> +       if (!critical_edges_split_p)
>>>>>>> +         {
>>>>>>> +           split_critical_edges ();
>>>>>>> +           critical_edges_split_p = true;
>>>>>>> +           todo |= TODO_cleanup_cfg;
>>>>>>> +         }
>>>>>>> +       todo |= tree_if_conversion (loop);
>>>>>>> +      }
>>>>>>>
>>>>>>>  #ifdef ENABLE_CHECKING
>>>>>>>    {
>>>>>>>
>>>>>>>> It is worth noting that in current implementation bb's with 2
>>>>>>>> predecessors and both are on critical edges are accepted without
>>>>>>>> additional option.
>>>>>>>
>>>>>>> Yes, I know.
>>>>>>>
>>>>>>> tree-if-conv.c is a mess right now and if we can avoid adding more
>>>>>>> to it and even fix the critical edge missed optimization with splitting
>>>>>>> critical edges then I am all for that solution.
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>> Thanks ahead.
>>>>>>>> Yuri.
>>>>>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>>>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>>>>> wrote:
>>>>>>>>>> Richard,
>>>>>>>>>>
>>>>>>>>>> Here is updated patch2 with the following changes:
>>>>>>>>>> 1. Delete functions  phi_has_two_different_args and 
>>>>>>>>>> find_insertion_point.
>>>>>>>>>> 2. Use only one function for extended predication -
>>>>>>>>>> predicate_extended_scalar_phi.
>>>>>>>>>> 3. Save gsi before insertion of predicate computations for basic
>>>>>>>>>> blocks if it has 2 predecessors and
>>>>>>>>>> both incoming edges are critical or it gas more than 2 predecessors
>>>>>>>>>> and at least one incoming edge
>>>>>>>>>> is critical. This saved iterator can be used by extended phi 
>>>>>>>>>> predication.
>>>>>>>>>>
>>>>>>>>>> Here is motivated test-case which explains this point.
>>>>>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>>>>>>>>> -ftree-loop-vectorize -fopenmp options.
>>>>>>>>>> The problem phi is in bb-7:
>>>>>>>>>>
>>>>>>>>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>>>>>>>>   {
>>>>>>>>>>     <bb 5>:
>>>>>>>>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>>>       goto <bb 7>;
>>>>>>>>>>     else
>>>>>>>>>>       goto <bb 9>;
>>>>>>>>>>
>>>>>>>>>>   }
>>>>>>>>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>>>>>>>>   {
>>>>>>>>>>     <bb 6>:
>>>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>>>       goto <bb 7>;
>>>>>>>>>>     else
>>>>>>>>>>       goto <bb 8>;
>>>>>>>>>>
>>>>>>>>>>   }
>>>>>>>>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>>>>>>>>   {
>>>>>>>>>>     <bb 7>:
>>>>>>>>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>     goto <bb 11>;
>>>>>>>>>>
>>>>>>>>>>   }
>>>>>>>>>>
>>>>>>>>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>>>>>>>>> restoring gsi in predicate_all_scalar_phi:
>>>>>>>>>> #if 0
>>>>>>>>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>>>>>>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>>>>>>>>    gsi = bb_insert_point (bb);
>>>>>>>>>>  else
>>>>>>>>>> #endif
>>>>>>>>>>    gsi = gsi_after_labels (bb);
>>>>>>>>>>
>>>>>>>>>> we will get ICE:
>>>>>>>>>> t5.c: In function 'foo':
>>>>>>>>>> t5.c:9:6: error: definition in block 4 follows the use
>>>>>>>>>>  void foo (int n)
>>>>>>>>>>       ^
>>>>>>>>>> for SSA_NAME: _1 in statement:
>>>>>>>>>> _52 = _1 & _3;
>>>>>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>>>>>>>>
>>>>>>>>>> smce predicate computations were inserted in bb_7.
>>>>>>>>>
>>>>>>>>> The issue is obviously that the predicates have already been emitted
>>>>>>>>> in the target BB - that's of course the wrong place.  This is done
>>>>>>>>> by insert_gimplified_predicates.
>>>>>>>>>
>>>>>>>>> This just shows how edge predicate handling is broken - we don't
>>>>>>>>> seem to have a sequence of gimplified stmts for edge predicates
>>>>>>>>> but push those to e->dest which makes this really messy.
>>>>>>>>>
>>>>>>>>> Rather than having a separate phase where we insert all
>>>>>>>>> gimplified bb predicates we should do that on-demand when
>>>>>>>>> predicating a PHI.
>>>>>>>>>
>>>>>>>>> Your patch writes to stderr - that's bad - use dump_file and guard
>>>>>>>>> the printfs properly.
>>>>>>>>>
>>>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>>>> paths.
>>>>>>>>>
>>>>>>>>> I'm not at all happy with this code.  It may be existing if-conv codes
>>>>>>>>> fault but making it even worse is not an option.
>>>>>>>>>
>>>>>>>>> Again - what's wrong with simply splitting critical edges if
>>>>>>>>> aggressive_if_conv?  I think that would very much simplify
>>>>>>>>> things here.  Or alternatively use gsi_insert_on_edge and
>>>>>>>>> commit edge insertions before merging the blocks.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>> ChangeLog is
>>>>>>>>>>
>>>>>>>>>> 2014-12-09  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>>>>
>>>>>>>>>> * tree-if-conv.c : Include hash-map.h.
>>>>>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>>>>>>>>> statement iterator.
>>>>>>>>>> (bb_insert_point): New function.
>>>>>>>>>> (set_bb_insert_point): New function.
>>>>>>>>>> (has_pred_critical_p): New function.
>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>> AGGRESSIVE_IF_CONV is true.
>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>>>>>>>>> Allow interchange PHI arguments if EXTENDED is false.
>>>>>>>>>> Change check that block containing reduction statement candidate
>>>>>>>>>> is predecessor of phi-block since phi may have more than two 
>>>>>>>>>> arguments.
>>>>>>>>>> (predicate_scalar_phi): Add new arguments for call of
>>>>>>>>>> is_cond_scalar_reduction.
>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>> (struct phi_args_hash_traits): New type.
>>>>>>>>>> (phi_args_hash_traits::hash): New function.
>>>>>>>>>> (phi_args_hash_traits::equal_keys): New function.
>>>>>>>>>> (gen_phi_arg_condition): New function.
>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>>>>>>>>> to true if BB containing phi has more than 2 predecessors or both
>>>>>>>>>> incoming edges are critical. Invoke find_phi_replacement_condition 
>>>>>>>>>> and
>>>>>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>>>>>>>>> has 2 predecessors and both incoming edges are critical or it has 
>>>>>>>>>> more
>>>>>>>>>> than 2 predecessors and atleast one incoming edge is critical.
>>>>>>>>>> Use standard gsi_after_labels otherwise.
>>>>>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>>>>>>>>> (insert_gimplified_predicates): Add bool variable 
>>>>>>>>>> EXTENDED_PREDICATION
>>>>>>>>>> to save gsi before insertion of predicate computations. SEt-up it to
>>>>>>>>>> true for BB with 2 predecessors and critical incoming edges either
>>>>>>>>>>         number of predecessors is geater 2 and at least one incoming 
>>>>>>>>>> edge is
>>>>>>>>>> critical.
>>>>>>>>>> Add check that non-predicated block may have statements to insert.
>>>>>>>>>> Insert predicate computation of BB just after label if
>>>>>>>>>> EXTENDED_PREDICATION is true.
>>>>>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>>>>>>>>> is copy of inner or outer loop force_vectorize field.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener 
>>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev 
>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>> Richard,
>>>>>>>>>>>>
>>>>>>>>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>>>>>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>>>>>>>>
>>>>>>>>>>>> typedef struct bb_predicate_s {
>>>>>>>>>>>>
>>>>>>>>>>>>   /* The condition under which this basic block is executed.  */
>>>>>>>>>>>>   tree predicate;
>>>>>>>>>>>>
>>>>>>>>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>>>>>>>>      recorded here, in order to avoid the duplication of 
>>>>>>>>>>>> computations
>>>>>>>>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>>>>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>>>>>>>>
>>>>>>>>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>>>>>>>>   gimple_stmt_iterator gsi;
>>>>>>>>>>>> } *bb_predicate_p;
>>>>>>>>>>>>
>>>>>>>>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>>>>>>>>> insertion code for predicate computation. I checked that this fix
>>>>>>>>>>>> works.
>>>>>>>>>>>
>>>>>>>>>>> Huh?  I still wonder what the issue is with inserting everything
>>>>>>>>>>> after the PHI we predicate.
>>>>>>>>>>>
>>>>>>>>>>> Well, your updated patch will come with testcases for the testsuite
>>>>>>>>>>> that will hopefully fail if doing that.
>>>>>>>>>>>
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Now I am implementing merging of predicate_extended.. and
>>>>>>>>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards.
>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener 
>>>>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev 
>>>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>>>> Thanks Richard for your quick reply!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>>>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>>>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>>>>>>>>> insertion point - if bb has use of phi result insert phi 
>>>>>>>>>>>>>> predication
>>>>>>>>>>>>>> before it and at the bb end otherwise. I assume that critical 
>>>>>>>>>>>>>> edge
>>>>>>>>>>>>>> splitting is not a good decision.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Why not always insert before the use?  Which would be after 
>>>>>>>>>>>>> labels,
>>>>>>>>>>>>> what we do for two-arg PHIs.  That is, how can it be that you 
>>>>>>>>>>>>> predicate
>>>>>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>>>>>>>>> can only happen for backedges but those you can't remove in any 
>>>>>>>>>>>>> case.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener 
>>>>>>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev 
>>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>>>>>>>>> 2. Use static cast for the first argument of 
>>>>>>>>>>>>>>>> gimple_phi_arg_edge.
>>>>>>>>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 
>>>>>>>>>>>>>>>> arguments only
>>>>>>>>>>>>>>>> and only one check can be used for phi predication (for 
>>>>>>>>>>>>>>>> reduction in
>>>>>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't 
>>>>>>>>>>>>>>>> do it
>>>>>>>>>>>>>>>> even if we sort all phi argument values since we still have to 
>>>>>>>>>>>>>>>> produce
>>>>>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see 
>>>>>>>>>>>>>>>> comments
>>>>>>>>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> How so?  We can always use !(condition) for the "last" value, 
>>>>>>>>>>>>>>> thus
>>>>>>>>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its 
>>>>>>>>>>>>>>> inversion
>>>>>>>>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( 
>>>>>>>>>>>>>>>> t5.c is
>>>>>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi 
>>>>>>>>>>>>>>>> nodes has
>>>>>>>>>>>>>>>> only critical incoming edges and both contain code computing 
>>>>>>>>>>>>>>>> edge
>>>>>>>>>>>>>>>> predicates, e.g.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> <bb 7>:
>>>>>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>>>>>>>>> _48 = _46 & _47;
>>>>>>>>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>>>>>>>>> _54 = ~_53;
>>>>>>>>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>>>>>>>>> _56 = _54 & _55;
>>>>>>>>>>>>>>>> _57 = _48 | _56;
>>>>>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>>>>>> goto <bb 11>;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>>>>>>>>> insertion point will be "after labels" Note also that phi 
>>>>>>>>>>>>>>>> result can
>>>>>>>>>>>>>>>> have use in this block too, so we can't put predication code 
>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>> block end.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So the issue is that predicate insertion for edge predicates 
>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>>>>>>>>> for critical edges unless you split them).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think I've told you before that I prefer simple solutions to 
>>>>>>>>>>>>>>> such issues,
>>>>>>>>>>>>>>> like splitting the edge!  Certainly not involving a function 
>>>>>>>>>>>>>>> walking
>>>>>>>>>>>>>>> GENERIC expressions.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener 
>>>>>>>>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev 
>>>>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since 
>>>>>>>>>>>>>>>>>> it may
>>>>>>>>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was 
>>>>>>>>>>>>>>>>>> introduced
>>>>>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are 
>>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI 
>>>>>>>>>>>>>>>>>> conditional
>>>>>>>>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>>>>>>>>  New function phi_has_two_different_args was introduced to 
>>>>>>>>>>>>>>>>>> detect such phi.
>>>>>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption 
>>>>>>>>>>>>>>>>>> that at
>>>>>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not 
>>>>>>>>>>>>>>>>>> critical - it
>>>>>>>>>>>>>>>>>> guarantees that all computations related to predicate of 
>>>>>>>>>>>>>>>>>> normal edge
>>>>>>>>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>>>>>>>>> code related to PHI predication can be inserted at the 
>>>>>>>>>>>>>>>>>> beginning of
>>>>>>>>>>>>>>>>>> block. But this is not true for critical edges for which 
>>>>>>>>>>>>>>>>>> predicate
>>>>>>>>>>>>>>>>>> computations are  in the block where code for phi 
>>>>>>>>>>>>>>>>>> predication must be
>>>>>>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced 
>>>>>>>>>>>>>>>>>> which is
>>>>>>>>>>>>>>>>>> simply found out the last statement in block defining 
>>>>>>>>>>>>>>>>>> predicates
>>>>>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi 
>>>>>>>>>>>>>>>>>> predication code
>>>>>>>>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>>>>>>>>> predicate_extended_scalar_phi and 
>>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi.
>>>>>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after 
>>>>>>>>>>>>>>>>> value
>>>>>>>>>>>>>>>>> and handle equal values specially in 
>>>>>>>>>>>>>>>>> predicate_extended_scalar_phi?
>>>>>>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I don't understand the need for find_insertion_point.  All 
>>>>>>>>>>>>>>>>> SSA names
>>>>>>>>>>>>>>>>> required for the predicates are defined upward - and the 
>>>>>>>>>>>>>>>>> complex CFG
>>>>>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will 
>>>>>>>>>>>>>>>>> dominate the
>>>>>>>>>>>>>>>>> inserted code if you insert after labels just like for the 
>>>>>>>>>>>>>>>>> other case.
>>>>>>>>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of 
>>>>>>>>>>>>>>>>> course needs
>>>>>>>>>>>>>>>>> to happen in dominator order - but I guess that happens 
>>>>>>>>>>>>>>>>> already?)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag 
>>>>>>>>>>>>>>>>> even
>>>>>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args 
>>>>>>>>>>>>>>>>> multiple
>>>>>>>>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do 
>>>>>>>>>>>>>>>>> this as
>>>>>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize 
>>>>>>>>>>>>>>>>> flag
>>>>>>>>>>>>>>>>> to something less looking like a flag, like simply 
>>>>>>>>>>>>>>>>> 'aggressive'.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors 
>>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose 
>>>>>>>>>>>>>>>>>> access
>>>>>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get 
>>>>>>>>>>>>>>>>>> phi
>>>>>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to 
>>>>>>>>>>>>>>>>>> insert
>>>>>>>>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means 
>>>>>>>>>>>>>>>>>> non-extended
>>>>>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add 
>>>>>>>>>>>>>>>>>> argument
>>>>>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables 
>>>>>>>>>>>>>>>>>> EXTENDED and
>>>>>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has 
>>>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. 
>>>>>>>>>>>>>>>>>> Invoke
>>>>>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi 
>>>>>>>>>>>>>>>>>> depending on
>>>>>>>>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that 
>>>>>>>>>>>>>>>>>> non-predicated block
>>>>>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just 
>>>>>>>>>>>>>>>>>> after label
>>>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of 
>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE which
>>>>>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.

Attachment: patch.20141221
Description: Binary data

Reply via email to