Richard, I am sending you full patch (~1000 lines) but if you need only patch.1 and patch.2 will let me know and i'll send you reduced patch.
Below are few comments regarding your remarks for patch.3. 1. I deleted sub-phase ifcvt_local_dce since I did not find test-case when dead code elimination is required to vectorize loop, i.e. dead statement is marked as relevant. 2. You wrote: > The "retry" code also looks odd - why do you walk the BB multiple > times instead of just doing sth like > > while (!has_single_use (lhs)) > { > gimple copy = ifcvt_split_def_stmt (def_stmt); > ifcvt_walk_pattern_tree (copy); > } > > thus returning the copy you create and re-process it (the copy should > now have a single-use). The problem is that not only top SSA_NAME (lhs) may have multiple uses but some intermediate variables too. For example, for the following test-case float a[1000]; int c[1000]; int foo() { int i, res = 0; #pragma omp simd safelen(8) for (i=0; i<512; i++) { float t = a[i]; if (t > 0.0f & t < 1.0e+17f) if (c[i] != 0) res += 1; } return res; } After combine_blocks we have the following bb: <bb 3>: # res_15 = PHI <res_1(7), 0(15)> # i_16 = PHI <i_11(7), 0(15)> # ivtmp_14 = PHI <ivtmp_13(7), 512(15)> t_5 = a[i_16]; _6 = t_5 > 0.0; _7 = t_5 < 9.9999998430674944e+16; _8 = _6 & _7; _10 = &c[i_16]; _ifc__32 = _8 ? 4294967295 : 0; _9 = MASK_LOAD (_10, 0B, _ifc__32); _28 = _8; _29 = _9 != 0; _30 = _28 & _29; _ifc__31 = _30 ? 1 : 0; res_1 = res_15 + _ifc__31; i_11 = i_16 + 1; ivtmp_13 = ivtmp_14 - 1; if (ivtmp_13 != 0) goto <bb 7>; else goto <bb 8>; and we can see that _8 has multiple uses. Also note that after splitting of _8 = _6 & _7 we also get multiple uses for definition of _6 and _7. So I used this iterative algorithm as the simplest one. I think it would be nice to re-use some utility from tree-vect-patterns.c for stmt_is_root_of_bool_pattern. I assume that function stmt_is_root_of_bool_pattern can be simplified to check on COND_EXPR only since PHI predication and memory access predication produced only such statements,i.e. it can look like static bool stmt_is_root_of_bool_pattern (gimple stmt, tree *var) { enum tree_code code; tree lhs, rhs; code = gimple_assign_rhs_code (stmt); if (code == COND_EXPR) { rhs = gimple_assign_rhs1 (stmt); if (TREE_CODE (rhs) != SSA_NAME) return false; *var = rhs; return true; } return false; } I also did few minor changes in patch.2. 3. You can also notice that I inserted code in tree_if_conversion to do loop version if explicit option "-ftree-loop-if-convert" was not passed to compiler, i.e. we perform if-conversion for loop vectorization only and if it does not take place, we should delete if-converted version of loop. What is your opinion? Thanks. Yuri. 2014-12-17 18:41 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: > On Tue, Dec 16, 2014 at 4:15 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >> Hi Richard, >> >> Here is updated patch which includes >> (1) split critical edges for aggressive if conversion. >> (2) delete all stuff related to support of critical edge predication. >> (3) only one function - predicate_scalar_phi performs predication. >> (4) function find_phi_replacement_condition was deleted since it was >> included in predicate_scalar_phi for phi with two arguments. >> >> I checked that patch works in stress testing mode, i.e. with >> aggressive if conversion by default. >> >> What is your opinion? > > Looks ok overall, but please simply do > > FOR_EACH_EDGE (e, ei, bb->succs) > if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop) > split_edge (e); > > for all blocks apart from the latch. > > Can you please send a combined patch up to this one? Looking at > the incremental diff is somewhat hard. Thus a patch including all > patches from patch1 to this one. > > Thanks, > Richard. > >> >> Thanks. >> Yuri. >> >> 2014-12-11 11:59 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>>> Richard, >>>> >>>> Thanks for your reply! >>>> >>>> I didn't understand your point: >>>> >>>> Well, I don't mind splitting all critical edges unconditionally >>>> >>>> but you do it unconditionally in proposed patch. >>> >>> I don't mind means I am fine with it. >>> >>>> Also I assume that >>>> call of split_critical_edges() can break ssa. For example, we can >>>> split headers of loops, loop exit blocks etc. >>> >>> How does that "break SSA"? You mean loop-closed SSA? I'd >>> be surprised if so but that may be possible. >>> >>>> I prefer to do something >>>> more loop-specialized, e.g. call edge_split() for critical edges >>>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge >>>> destination bb belongs to loop). >>> >>> That works for me as well but it is more complicated to implement. >>> Ideally you'd only split one edge if you find a block with only critical >>> predecessors (where we'd currently give up). But note that this >>> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it >>> will change loop->num_nodes so we have to be more careful in >>> constructing the loop calling if_convertible_bb_p. >>> >>> Richard. >>> >>>> >>>> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>> wrote: >>>>>> Richard, >>>>>> >>>>>> Sorry that I forgot to delete debug dump from my fix. >>>>>> I have few questions about your comments. >>>>>> >>>>>> 1. You wrote : >>>>>>> You also still have two functions for PHI predication. And the >>>>>>> new extended variant doesn't commonize the 2-args and general >>>>>>> path >>>>>> Did you mean that I must combine predicate_scalar_phi and >>>>>> predicate_extended scalar phi to one function? >>>>>> Please note that if additional flag was not set up (i.e. >>>>>> aggressive_if_conv is false) extended predication is required more >>>>>> compile time since it builds hash_map. >>>>> >>>>> It's compile-time complexity is reasonable enough even for >>>>> non-aggressive if-conversion. >>>>> >>>>>> 2. About critical edge splitting. >>>>>> >>>>>> Did you mean that we should perform it (1) under aggressive_if_conv >>>>>> option only; (2) should we split all critical edges. >>>>>> Note that this leads to recomputing of topological order. >>>>> >>>>> Well, I don't mind splitting all critical edges unconditionally, thus >>>>> do something like >>>>> >>>>> Index: gcc/tree-if-conv.c >>>>> =================================================================== >>>>> --- gcc/tree-if-conv.c (revision 218515) >>>>> +++ gcc/tree-if-conv.c (working copy) >>>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f >>>>> if (number_of_loops (fun) <= 1) >>>>> return 0; >>>>> >>>>> + bool critical_edges_split_p = false; >>>>> FOR_EACH_LOOP (loop, 0) >>>>> if (flag_tree_loop_if_convert == 1 >>>>> || flag_tree_loop_if_convert_stores == 1 >>>>> || ((flag_tree_loop_vectorize || loop->force_vectorize) >>>>> && !loop->dont_vectorize)) >>>>> - todo |= tree_if_conversion (loop); >>>>> + { >>>>> + if (!critical_edges_split_p) >>>>> + { >>>>> + split_critical_edges (); >>>>> + critical_edges_split_p = true; >>>>> + todo |= TODO_cleanup_cfg; >>>>> + } >>>>> + todo |= tree_if_conversion (loop); >>>>> + } >>>>> >>>>> #ifdef ENABLE_CHECKING >>>>> { >>>>> >>>>>> It is worth noting that in current implementation bb's with 2 >>>>>> predecessors and both are on critical edges are accepted without >>>>>> additional option. >>>>> >>>>> Yes, I know. >>>>> >>>>> tree-if-conv.c is a mess right now and if we can avoid adding more >>>>> to it and even fix the critical edge missed optimization with splitting >>>>> critical edges then I am all for that solution. >>>>> >>>>> Richard. >>>>> >>>>>> Thanks ahead. >>>>>> Yuri. >>>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>> wrote: >>>>>>>> Richard, >>>>>>>> >>>>>>>> Here is updated patch2 with the following changes: >>>>>>>> 1. Delete functions phi_has_two_different_args and >>>>>>>> find_insertion_point. >>>>>>>> 2. Use only one function for extended predication - >>>>>>>> predicate_extended_scalar_phi. >>>>>>>> 3. Save gsi before insertion of predicate computations for basic >>>>>>>> blocks if it has 2 predecessors and >>>>>>>> both incoming edges are critical or it gas more than 2 predecessors >>>>>>>> and at least one incoming edge >>>>>>>> is critical. This saved iterator can be used by extended phi >>>>>>>> predication. >>>>>>>> >>>>>>>> Here is motivated test-case which explains this point. >>>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2 >>>>>>>> -ftree-loop-vectorize -fopenmp options. >>>>>>>> The problem phi is in bb-7: >>>>>>>> >>>>>>>> bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 }) >>>>>>>> { >>>>>>>> <bb 5>: >>>>>>>> xmax_edge_18 = xmax_edge_36 + 1; >>>>>>>> if (xmax_17 == xmax_27) >>>>>>>> goto <bb 7>; >>>>>>>> else >>>>>>>> goto <bb 9>; >>>>>>>> >>>>>>>> } >>>>>>>> bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 }) >>>>>>>> { >>>>>>>> <bb 6>: >>>>>>>> if (xmax_17 == xmax_27) >>>>>>>> goto <bb 7>; >>>>>>>> else >>>>>>>> goto <bb 8>; >>>>>>>> >>>>>>>> } >>>>>>>> bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 }) >>>>>>>> { >>>>>>>> <bb 7>: >>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)> >>>>>>>> xmax_edge_19 = xmax_edge_39 + 1; >>>>>>>> goto <bb 11>; >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> Note that both incoming edges to bb_7 are critical. If we comment out >>>>>>>> restoring gsi in predicate_all_scalar_phi: >>>>>>>> #if 0 >>>>>>>> if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb)) >>>>>>>> || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb))) >>>>>>>> gsi = bb_insert_point (bb); >>>>>>>> else >>>>>>>> #endif >>>>>>>> gsi = gsi_after_labels (bb); >>>>>>>> >>>>>>>> we will get ICE: >>>>>>>> t5.c: In function 'foo': >>>>>>>> t5.c:9:6: error: definition in block 4 follows the use >>>>>>>> void foo (int n) >>>>>>>> ^ >>>>>>>> for SSA_NAME: _1 in statement: >>>>>>>> _52 = _1 & _3; >>>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed >>>>>>>> >>>>>>>> smce predicate computations were inserted in bb_7. >>>>>>> >>>>>>> The issue is obviously that the predicates have already been emitted >>>>>>> in the target BB - that's of course the wrong place. This is done >>>>>>> by insert_gimplified_predicates. >>>>>>> >>>>>>> This just shows how edge predicate handling is broken - we don't >>>>>>> seem to have a sequence of gimplified stmts for edge predicates >>>>>>> but push those to e->dest which makes this really messy. >>>>>>> >>>>>>> Rather than having a separate phase where we insert all >>>>>>> gimplified bb predicates we should do that on-demand when >>>>>>> predicating a PHI. >>>>>>> >>>>>>> Your patch writes to stderr - that's bad - use dump_file and guard >>>>>>> the printfs properly. >>>>>>> >>>>>>> You also still have two functions for PHI predication. And the >>>>>>> new extended variant doesn't commonize the 2-args and general >>>>>>> paths. >>>>>>> >>>>>>> I'm not at all happy with this code. It may be existing if-conv codes >>>>>>> fault but making it even worse is not an option. >>>>>>> >>>>>>> Again - what's wrong with simply splitting critical edges if >>>>>>> aggressive_if_conv? I think that would very much simplify >>>>>>> things here. Or alternatively use gsi_insert_on_edge and >>>>>>> commit edge insertions before merging the blocks. >>>>>>> >>>>>>> Thanks, >>>>>>> Richard. >>>>>>> >>>>>>>> ChangeLog is >>>>>>>> >>>>>>>> 2014-12-09 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>> >>>>>>>> * tree-if-conv.c : Include hash-map.h. >>>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple >>>>>>>> statement iterator. >>>>>>>> (bb_insert_point): New function. >>>>>>>> (set_bb_insert_point): New function. >>>>>>>> (has_pred_critical_p): New function. >>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if >>>>>>>> AGGRESSIVE_IF_CONV is true. >>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one >>>>>>>> non-critical incoming edge. >>>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED. >>>>>>>> Allow interchange PHI arguments if EXTENDED is false. >>>>>>>> Change check that block containing reduction statement candidate >>>>>>>> is predecessor of phi-block since phi may have more than two arguments. >>>>>>>> (predicate_scalar_phi): Add new arguments for call of >>>>>>>> is_cond_scalar_reduction. >>>>>>>> (get_predicate_for_edge): New function. >>>>>>>> (struct phi_args_hash_traits): New type. >>>>>>>> (phi_args_hash_traits::hash): New function. >>>>>>>> (phi_args_hash_traits::equal_keys): New function. >>>>>>>> (gen_phi_arg_condition): New function. >>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it >>>>>>>> to true if BB containing phi has more than 2 predecessors or both >>>>>>>> incoming edges are critical. Invoke find_phi_replacement_condition and >>>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB >>>>>>>> has 2 predecessors and both incoming edges are critical or it has more >>>>>>>> than 2 predecessors and atleast one incoming edge is critical. >>>>>>>> Use standard gsi_after_labels otherwise. >>>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true. >>>>>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION >>>>>>>> to save gsi before insertion of predicate computations. SEt-up it to >>>>>>>> true for BB with 2 predecessors and critical incoming edges either >>>>>>>> number of predecessors is geater 2 and at least one incoming >>>>>>>> edge is >>>>>>>> critical. >>>>>>>> Add check that non-predicated block may have statements to insert. >>>>>>>> Insert predicate computation of BB just after label if >>>>>>>> EXTENDED_PREDICATION is true. >>>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which >>>>>>>> is copy of inner or outer loop force_vectorize field. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>> Richard, >>>>>>>>>> >>>>>>>>>> I did simple change by saving gsi iterator for each bb that has >>>>>>>>>> critical edges by adding additional field to bb_predicate_s: >>>>>>>>>> >>>>>>>>>> typedef struct bb_predicate_s { >>>>>>>>>> >>>>>>>>>> /* The condition under which this basic block is executed. */ >>>>>>>>>> tree predicate; >>>>>>>>>> >>>>>>>>>> /* PREDICATE is gimplified, and the sequence of statements is >>>>>>>>>> recorded here, in order to avoid the duplication of computations >>>>>>>>>> that occur in previous conditions. See PR44483. */ >>>>>>>>>> gimple_seq predicate_gimplified_stmts; >>>>>>>>>> >>>>>>>>>> /* Insertion point for blocks having incoming critical edges. */ >>>>>>>>>> gimple_stmt_iterator gsi; >>>>>>>>>> } *bb_predicate_p; >>>>>>>>>> >>>>>>>>>> and this iterator is saved in insert_gimplified_predicates before >>>>>>>>>> insertion code for predicate computation. I checked that this fix >>>>>>>>>> works. >>>>>>>>> >>>>>>>>> Huh? I still wonder what the issue is with inserting everything >>>>>>>>> after the PHI we predicate. >>>>>>>>> >>>>>>>>> Well, your updated patch will come with testcases for the testsuite >>>>>>>>> that will hopefully fail if doing that. >>>>>>>>> >>>>>>>>> Richard. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Now I am implementing merging of predicate_extended.. and >>>>>>>>>> predicate_arbitrary.. functions as you proposed. >>>>>>>>>> >>>>>>>>>> Best regards. >>>>>>>>>> Yuri. >>>>>>>>>> >>>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener >>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev >>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>> Thanks Richard for your quick reply! >>>>>>>>>>>> >>>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and >>>>>>>>>>>> predicate_arbitrary_ to one function as you proposed. >>>>>>>>>>>> 2. What is your opinion about using more simple decision about >>>>>>>>>>>> insertion point - if bb has use of phi result insert phi >>>>>>>>>>>> predication >>>>>>>>>>>> before it and at the bb end otherwise. I assume that critical edge >>>>>>>>>>>> splitting is not a good decision. >>>>>>>>>>> >>>>>>>>>>> Why not always insert before the use? Which would be after labels, >>>>>>>>>>> what we do for two-arg PHIs. That is, how can it be that you >>>>>>>>>>> predicate >>>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming >>>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself? That >>>>>>>>>>> can only happen for backedges but those you can't remove in any >>>>>>>>>>> case. >>>>>>>>>>> >>>>>>>>>>> Richard. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Best regards. >>>>>>>>>>>> Yuri. >>>>>>>>>>>> >>>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener >>>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev >>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>> Hi Richard, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes: >>>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv. >>>>>>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge. >>>>>>>>>>>>>> I also very sorry that I sent you bad patch. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Now let me answer on your questions related to second patch. >>>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and >>>>>>>>>>>>>> predicate_arbitrary_scalar_phi? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Let's consider the following simple test-case: >>>>>>>>>>>>>> >>>>>>>>>>>>>> #pragma omp simd safelen(8) >>>>>>>>>>>>>> for (i=0; i<512; i++) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> float t = a[i]; >>>>>>>>>>>>>> if (t > 0.0f & t < 1.0e+17f) >>>>>>>>>>>>>> if (c[i] != 0) /* c is integer array. */ >>>>>>>>>>>>>> res += 1; >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> we can see the following phi node correspondent to res: >>>>>>>>>>>>>> >>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)> >>>>>>>>>>>>>> >>>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments >>>>>>>>>>>>>> only >>>>>>>>>>>>>> and only one check can be used for phi predication (for >>>>>>>>>>>>>> reduction in >>>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do >>>>>>>>>>>>>> it >>>>>>>>>>>>>> even if we sort all phi argument values since we still have to >>>>>>>>>>>>>> produce >>>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see >>>>>>>>>>>>>> comments >>>>>>>>>>>>>> for predicate_arbitrary_scalar_phi). >>>>>>>>>>>>> >>>>>>>>>>>>> How so? We can always use !(condition) for the "last" value, thus >>>>>>>>>>>>> treat it as an 'else' case. That even works for >>>>>>>>>>>>> >>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)> >>>>>>>>>>>>> >>>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as >>>>>>>>>>>>> ! (condition for 3 || condition for 4). >>>>>>>>>>>>> >>>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first >>>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion >>>>>>>>>>>>> used for edges 3 and 4 combined. >>>>>>>>>>>>> >>>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point? >>>>>>>>>>>>>> Let's consider another test-case extracted from 175.vpr ( t5.c >>>>>>>>>>>>>> is >>>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes >>>>>>>>>>>>>> has >>>>>>>>>>>>>> only critical incoming edges and both contain code computing edge >>>>>>>>>>>>>> predicates, e.g. >>>>>>>>>>>>>> >>>>>>>>>>>>>> <bb 7>: >>>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)> >>>>>>>>>>>>>> _46 = xmax_17 == xmax_37; >>>>>>>>>>>>>> _47 = xmax_17 == xmax_27; >>>>>>>>>>>>>> _48 = _46 & _47; >>>>>>>>>>>>>> _53 = xmax_17 == xmax_37; >>>>>>>>>>>>>> _54 = ~_53; >>>>>>>>>>>>>> _55 = xmax_17 == xmax_27; >>>>>>>>>>>>>> _56 = _54 & _55; >>>>>>>>>>>>>> _57 = _48 | _56; >>>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1; >>>>>>>>>>>>>> goto <bb 11>; >>>>>>>>>>>>>> >>>>>>>>>>>>>> It is evident that we can not put phi predication at the block >>>>>>>>>>>>>> beginning but need to put it after predicate computations. >>>>>>>>>>>>>> Note also that if there are no critical edges for phi arguments >>>>>>>>>>>>>> insertion point will be "after labels" Note also that phi result >>>>>>>>>>>>>> can >>>>>>>>>>>>>> have use in this block too, so we can't put predication code to >>>>>>>>>>>>>> the >>>>>>>>>>>>>> block end. >>>>>>>>>>>>> >>>>>>>>>>>>> So the issue is that predicate insertion for edge predicates does >>>>>>>>>>>>> not happen on the edge but somewhere else (generally impossible >>>>>>>>>>>>> for critical edges unless you split them). >>>>>>>>>>>>> >>>>>>>>>>>>> I think I've told you before that I prefer simple solutions to >>>>>>>>>>>>> such issues, >>>>>>>>>>>>> like splitting the edge! Certainly not involving a function >>>>>>>>>>>>> walking >>>>>>>>>>>>> GENERIC expressions. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Richard. >>>>>>>>>>>>> >>>>>>>>>>>>>> Let me know if you still have any questions. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best regards. >>>>>>>>>>>>>> Yuri. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener >>>>>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev >>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Here is the second patch related to extended predication. >>>>>>>>>>>>>>>> Few comments which explain a main goal of design. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it >>>>>>>>>>>>>>>> may >>>>>>>>>>>>>>>> lead to less efficient binaries. >>>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was >>>>>>>>>>>>>>>> introduced >>>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are >>>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI >>>>>>>>>>>>>>>> conditional >>>>>>>>>>>>>>>> scalar reduction is applied. >>>>>>>>>>>>>>>> This is correspondent to the following statement: >>>>>>>>>>>>>>>> if (q1 && q2 && q3) var++ >>>>>>>>>>>>>>>> New function phi_has_two_different_args was introduced to >>>>>>>>>>>>>>>> detect such phi. >>>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not >>>>>>>>>>>>>>>> critical - it >>>>>>>>>>>>>>>> guarantees that all computations related to predicate of >>>>>>>>>>>>>>>> normal edge >>>>>>>>>>>>>>>> are already inserted above this block and >>>>>>>>>>>>>>>> code related to PHI predication can be inserted at the >>>>>>>>>>>>>>>> beginning of >>>>>>>>>>>>>>>> block. But this is not true for critical edges for which >>>>>>>>>>>>>>>> predicate >>>>>>>>>>>>>>>> computations are in the block where code for phi predication >>>>>>>>>>>>>>>> must be >>>>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced >>>>>>>>>>>>>>>> which is >>>>>>>>>>>>>>>> simply found out the last statement in block defining >>>>>>>>>>>>>>>> predicates >>>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi predication >>>>>>>>>>>>>>>> code >>>>>>>>>>>>>>>> after it (with some minor exceptions). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@ >>>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> a few remarks nevertheless. I don't see how we need both >>>>>>>>>>>>>>> predicate_extended_scalar_phi and >>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi. >>>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after >>>>>>>>>>>>>>> value >>>>>>>>>>>>>>> and handle equal values specially in >>>>>>>>>>>>>>> predicate_extended_scalar_phi? >>>>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I don't understand the need for find_insertion_point. All SSA >>>>>>>>>>>>>>> names >>>>>>>>>>>>>>> required for the predicates are defined upward - and the >>>>>>>>>>>>>>> complex CFG >>>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will >>>>>>>>>>>>>>> dominate the >>>>>>>>>>>>>>> inserted code if you insert after labels just like for the >>>>>>>>>>>>>>> other case. >>>>>>>>>>>>>>> Or what am I missing? ("flattening" of the basic-blocks of >>>>>>>>>>>>>>> course needs >>>>>>>>>>>>>>> to happen in dominator order - but I guess that happens >>>>>>>>>>>>>>> already?) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag >>>>>>>>>>>>>>> even >>>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args >>>>>>>>>>>>>>> multiple >>>>>>>>>>>>>>> times that would have been nice to vectorize. I suggest to >>>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this. We can do this >>>>>>>>>>>>>>> as >>>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag >>>>>>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Otherwise patch 2 looks ok to me. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ChangeLog: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2014-10-24 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use >>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag. >>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if >>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true. >>>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one >>>>>>>>>>>>>>>> non-critical incoming edge. >>>>>>>>>>>>>>>> (phi_has_two_different_args): New function. >>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose >>>>>>>>>>>>>>>> access >>>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi >>>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block >>>>>>>>>>>>>>>> containing reduction statement candidate is predecessor >>>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments. >>>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert >>>>>>>>>>>>>>>> statement before/after gsi point. >>>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means >>>>>>>>>>>>>>>> non-extended >>>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument >>>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of >>>>>>>>>>>>>>>> convert_scalar_cond_reduction. >>>>>>>>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function. >>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables >>>>>>>>>>>>>>>> EXTENDED and >>>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has >>>>>>>>>>>>>>>> more >>>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke >>>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or >>>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi >>>>>>>>>>>>>>>> depending on >>>>>>>>>>>>>>>> EXTENDED value. >>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated >>>>>>>>>>>>>>>> block >>>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just >>>>>>>>>>>>>>>> after label >>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true. >>>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of >>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE which >>>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.
patch.20141218
Description: Binary data