On Mon, Dec 22, 2014 at 3:39 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > Richard, > > I changed algorithm for bool pattern repair. > It turned out that ifcvt_local_dce phaase is required since for > test-case I sent you in previous mail vectorization is not performed > without dead code elimination: > > For the loop > #pragma omp simd safelen(8) > for (i=0; i<512; i++) > { > float t = a[i]; > if (t > 0.0f & t < 1.0e+17f) > if (c[i] != 0) > res += 1; > } > > I've got the following message from vectorizer: > > t3.c:10:11: note: ==> examining statement: _ifc__39 = t_5 > 0.0; > > t3.c:10:11: note: bit-precision arithmetic not supported. > t3.c:10:11: note: not vectorized: relevant stmt not supported: > _ifc__39 = t_5 > 0.0; > > It is caused by the following dead predicate computations after > critical edge splitting: > > (after combine blocks): > > <bb 3>: > # res_15 = PHI <res_1(7), 0(19)> > # i_16 = PHI <i_11(7), 0(19)> > # ivtmp_14 = PHI <ivtmp_13(7), 512(19)> > t_5 = a[i_16]; > _6 = t_5 > 0.0; > _7 = t_5 < 9.9999998430674944e+16; > _8 = _6 & _7; > _10 = &c[i_16]; > _ifc__36 = _8 ? 4294967295 : 0; > _9 = MASK_LOAD (_10, 0B, _ifc__36); > _28 = _8; > _29 = _9 != 0; > _30 = _28 & _29; > // Statements below are dead!! > _31 = _8; > _32 = _9 != 0; > _33 = ~_32; > _34 = _31 & _33; > // End of dead statements. > _ifc__35 = _30 ? 1 : 0; > res_1 = res_15 + _ifc__35; > i_11 = i_16 + 1; > ivtmp_13 = ivtmp_14 - 1; > if (ivtmp_13 != 0) > goto <bb 7>; > else > goto <bb 8>; > > But if we delete these statements loop will be vectorized.
Hm, ok. We insert predicates too early obviously and not only when needed. But let's fix that later. > New patch is attached. fold_build_cond_expr (tree type, tree cond, tree rhs, tree lhs) { tree rhs1, lhs1, cond_expr; + + /* If COND is comparison r != 0 and r has boolean type, convert COND + to SSA_NAME to accept by vect bool pattern. */ + if (TREE_CODE (cond) == NE_EXPR) + { + tree op0 = TREE_OPERAND (cond, 0); + tree op1 = TREE_OPERAND (cond, 1); + if (TREE_CODE (op0) == SSA_NAME + && TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE + && (integer_zerop (op1))) + cond = op0; + else if (TREE_CODE (op1) == SSA_NAME + && TREE_CODE (TREE_TYPE (op1)) == BOOLEAN_TYPE + && (integer_zerop (op0))) + cond = op1; The 2nd form, 0 != SSA_NAME doesn't happen due to operand canonicalization. Please remove its handling. + if (gimple_phi_num_args (phi) != 2) + { + if (!aggressive_if_conv) && !aggressive_if_conv + if (EDGE_COUNT (bb->preds) > 2) + { + if (!aggressive_if_conv) Likewise. - gimple reduc; + && (rhs = gimple_phi_arg_def (phi, 0)))) { the { goes to the next line static void predicate_mem_writes (loop_p loop) { - unsigned int i, orig_loop_num_nodes = loop->num_nodes; + unsigned int i, j, orig_loop_num_nodes = loop->num_nodes; + tree mask_vec[10]; an upper limit of 10? + for (j=0; j<10; j++) spaces around '<' and '=' + mask_vec[j] = NULL_TREE; + + gcc_assert (exact_log2 (bitsize) != -1); + if ((mask = mask_vec[exact_log2 (bitsize)]) == NULL_TREE) + { this seems to be a completely separate "optimization"? Note that there are targets with non-power-of-two bitsize modes (PSImode), so the assert will likely trigger. I would prefer if you separate this part of the patch. + if ( gimple_code (stmt) != GIMPLE_ASSIGN) + continue; no space before gimple_code + imm_use_iterator imm_iter; + + + worklist.create (64); excessive vertical space. The patch misses the addition of new testcases - please add some, otherwise the code will be totally untested. I assume the patch passes bootstrap and regtest (you didn't say so). Can you also do a bootstrap with aggressive_if_conv forced to true and --with-build-config=bootstrap-O3 --disable-werror? Thanks, Richard. > Thanks. > Yuri. > > 2014-12-19 14:45 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >> On Thu, Dec 18, 2014 at 2:45 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>> Richard, >>> >>> I am sending you full patch (~1000 lines) but if you need only patch.1 >>> and patch.2 will let me know and i'll send you reduced patch. >>> >>> Below are few comments regarding your remarks for patch.3. >>> >>> 1. I deleted sub-phase ifcvt_local_dce since I did not find test-case >>> when dead code elimination is required to vectorize loop, i.e. dead >>> statement is marked as relevant. >>> 2. You wrote: >>>> The "retry" code also looks odd - why do you walk the BB multiple >>>> times instead of just doing sth like >>>> >>>> while (!has_single_use (lhs)) >>>> { >>>> gimple copy = ifcvt_split_def_stmt (def_stmt); >>>> ifcvt_walk_pattern_tree (copy); >>>> } >>>> >>>> thus returning the copy you create and re-process it (the copy should >>>> now have a single-use). >>> >>> The problem is that not only top SSA_NAME (lhs) may have multiple uses >>> but some intermediate variables too. For example, for the following >>> test-case >>> >>> float a[1000]; >>> int c[1000]; >>> >>> int foo() >>> { >>> int i, res = 0; >>> #pragma omp simd safelen(8) >>> for (i=0; i<512; i++) >>> { >>> float t = a[i]; >>> if (t > 0.0f & t < 1.0e+17f) >>> if (c[i] != 0) >>> res += 1; >>> } >>> return res; >>> } >>> >>> After combine_blocks we have the following bb: >>> >>> <bb 3>: >>> # res_15 = PHI <res_1(7), 0(15)> >>> # i_16 = PHI <i_11(7), 0(15)> >>> # ivtmp_14 = PHI <ivtmp_13(7), 512(15)> >>> t_5 = a[i_16]; >>> _6 = t_5 > 0.0; >>> _7 = t_5 < 9.9999998430674944e+16; >>> _8 = _6 & _7; >>> _10 = &c[i_16]; >>> _ifc__32 = _8 ? 4294967295 : 0; >>> _9 = MASK_LOAD (_10, 0B, _ifc__32); >>> _28 = _8; >>> _29 = _9 != 0; >>> _30 = _28 & _29; >>> _ifc__31 = _30 ? 1 : 0; >>> res_1 = res_15 + _ifc__31; >>> i_11 = i_16 + 1; >>> ivtmp_13 = ivtmp_14 - 1; >>> if (ivtmp_13 != 0) >>> goto <bb 7>; >>> else >>> goto <bb 8>; >>> >>> and we can see that _8 has multiple uses. Also note that after splitting of >>> _8 = _6 & _7 >>> we also get multiple uses for definition of _6 and _7. So I used this >>> iterative algorithm as the simplest one. >> >> But it walks the entire pattern again and again while you only need to >> ensure you walk the pattern tree of the now single-use DEF again >> (in fact, rather than replacing a random USE in ifcvt_split_def_stmt >> you should pass down the user_operand_p that you need to make >> single-use). >> >>> I think it would be nice to re-use some utility from tree-vect-patterns.c >>> for stmt_is_root_of_bool_pattern. >>> >>> I assume that function stmt_is_root_of_bool_pattern can be simplified >>> to check on COND_EXPR only since PHI predication and memory access >>> predication produced only such statements,i.e. it can look like >>> >>> static bool >>> stmt_is_root_of_bool_pattern (gimple stmt, tree *var) >>> { >>> enum tree_code code; >>> tree lhs, rhs; >>> >>> code = gimple_assign_rhs_code (stmt); >>> if (code == COND_EXPR) >>> { >>> rhs = gimple_assign_rhs1 (stmt); >>> if (TREE_CODE (rhs) != SSA_NAME) >>> return false; >>> *var = rhs; >>> return true; >>> } >>> return false; >>> } >>> >>> I also did few minor changes in patch.2. >>> >>> 3. You can also notice that I inserted code in tree_if_conversion to >>> do loop version if explicit option "-ftree-loop-if-convert" was not >>> passed to compiler, i.e. we perform if-conversion for loop >>> vectorization only and if it does not take place, we should delete >>> if-converted version of loop. >>> What is your opinion? >> >> Overall part 1 and part 2 look good to me, predicate_scalar_phi >> looks in need of some refactoring to avoid duplicate code. We can >> do that a followup. >> >> Part 3 still needs the iteration to be resolved and make the use we >> actually care about single-use, not a random one so we can avoid >> iterating completely. >> >> Richard. >> >>> Thanks. >>> Yuri. >>> >>> 2014-12-17 18:41 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>> On Tue, Dec 16, 2014 at 4:15 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>> wrote: >>>>> Hi Richard, >>>>> >>>>> Here is updated patch which includes >>>>> (1) split critical edges for aggressive if conversion. >>>>> (2) delete all stuff related to support of critical edge predication. >>>>> (3) only one function - predicate_scalar_phi performs predication. >>>>> (4) function find_phi_replacement_condition was deleted since it was >>>>> included in predicate_scalar_phi for phi with two arguments. >>>>> >>>>> I checked that patch works in stress testing mode, i.e. with >>>>> aggressive if conversion by default. >>>>> >>>>> What is your opinion? >>>> >>>> Looks ok overall, but please simply do >>>> >>>> FOR_EACH_EDGE (e, ei, bb->succs) >>>> if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop) >>>> split_edge (e); >>>> >>>> for all blocks apart from the latch. >>>> >>>> Can you please send a combined patch up to this one? Looking at >>>> the incremental diff is somewhat hard. Thus a patch including all >>>> patches from patch1 to this one. >>>> >>>> Thanks, >>>> Richard. >>>> >>>>> >>>>> Thanks. >>>>> Yuri. >>>>> >>>>> 2014-12-11 11:59 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>>> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>> wrote: >>>>>>> Richard, >>>>>>> >>>>>>> Thanks for your reply! >>>>>>> >>>>>>> I didn't understand your point: >>>>>>> >>>>>>> Well, I don't mind splitting all critical edges unconditionally >>>>>>> >>>>>>> but you do it unconditionally in proposed patch. >>>>>> >>>>>> I don't mind means I am fine with it. >>>>>> >>>>>>> Also I assume that >>>>>>> call of split_critical_edges() can break ssa. For example, we can >>>>>>> split headers of loops, loop exit blocks etc. >>>>>> >>>>>> How does that "break SSA"? You mean loop-closed SSA? I'd >>>>>> be surprised if so but that may be possible. >>>>>> >>>>>>> I prefer to do something >>>>>>> more loop-specialized, e.g. call edge_split() for critical edges >>>>>>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge >>>>>>> destination bb belongs to loop). >>>>>> >>>>>> That works for me as well but it is more complicated to implement. >>>>>> Ideally you'd only split one edge if you find a block with only critical >>>>>> predecessors (where we'd currently give up). But note that this >>>>>> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it >>>>>> will change loop->num_nodes so we have to be more careful in >>>>>> constructing the loop calling if_convertible_bb_p. >>>>>> >>>>>> Richard. >>>>>> >>>>>>> >>>>>>> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>>>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>> wrote: >>>>>>>>> Richard, >>>>>>>>> >>>>>>>>> Sorry that I forgot to delete debug dump from my fix. >>>>>>>>> I have few questions about your comments. >>>>>>>>> >>>>>>>>> 1. You wrote : >>>>>>>>>> You also still have two functions for PHI predication. And the >>>>>>>>>> new extended variant doesn't commonize the 2-args and general >>>>>>>>>> path >>>>>>>>> Did you mean that I must combine predicate_scalar_phi and >>>>>>>>> predicate_extended scalar phi to one function? >>>>>>>>> Please note that if additional flag was not set up (i.e. >>>>>>>>> aggressive_if_conv is false) extended predication is required more >>>>>>>>> compile time since it builds hash_map. >>>>>>>> >>>>>>>> It's compile-time complexity is reasonable enough even for >>>>>>>> non-aggressive if-conversion. >>>>>>>> >>>>>>>>> 2. About critical edge splitting. >>>>>>>>> >>>>>>>>> Did you mean that we should perform it (1) under aggressive_if_conv >>>>>>>>> option only; (2) should we split all critical edges. >>>>>>>>> Note that this leads to recomputing of topological order. >>>>>>>> >>>>>>>> Well, I don't mind splitting all critical edges unconditionally, thus >>>>>>>> do something like >>>>>>>> >>>>>>>> Index: gcc/tree-if-conv.c >>>>>>>> =================================================================== >>>>>>>> --- gcc/tree-if-conv.c (revision 218515) >>>>>>>> +++ gcc/tree-if-conv.c (working copy) >>>>>>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f >>>>>>>> if (number_of_loops (fun) <= 1) >>>>>>>> return 0; >>>>>>>> >>>>>>>> + bool critical_edges_split_p = false; >>>>>>>> FOR_EACH_LOOP (loop, 0) >>>>>>>> if (flag_tree_loop_if_convert == 1 >>>>>>>> || flag_tree_loop_if_convert_stores == 1 >>>>>>>> || ((flag_tree_loop_vectorize || loop->force_vectorize) >>>>>>>> && !loop->dont_vectorize)) >>>>>>>> - todo |= tree_if_conversion (loop); >>>>>>>> + { >>>>>>>> + if (!critical_edges_split_p) >>>>>>>> + { >>>>>>>> + split_critical_edges (); >>>>>>>> + critical_edges_split_p = true; >>>>>>>> + todo |= TODO_cleanup_cfg; >>>>>>>> + } >>>>>>>> + todo |= tree_if_conversion (loop); >>>>>>>> + } >>>>>>>> >>>>>>>> #ifdef ENABLE_CHECKING >>>>>>>> { >>>>>>>> >>>>>>>>> It is worth noting that in current implementation bb's with 2 >>>>>>>>> predecessors and both are on critical edges are accepted without >>>>>>>>> additional option. >>>>>>>> >>>>>>>> Yes, I know. >>>>>>>> >>>>>>>> tree-if-conv.c is a mess right now and if we can avoid adding more >>>>>>>> to it and even fix the critical edge missed optimization with splitting >>>>>>>> critical edges then I am all for that solution. >>>>>>>> >>>>>>>> Richard. >>>>>>>> >>>>>>>>> Thanks ahead. >>>>>>>>> Yuri. >>>>>>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener >>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>>> Richard, >>>>>>>>>>> >>>>>>>>>>> Here is updated patch2 with the following changes: >>>>>>>>>>> 1. Delete functions phi_has_two_different_args and >>>>>>>>>>> find_insertion_point. >>>>>>>>>>> 2. Use only one function for extended predication - >>>>>>>>>>> predicate_extended_scalar_phi. >>>>>>>>>>> 3. Save gsi before insertion of predicate computations for basic >>>>>>>>>>> blocks if it has 2 predecessors and >>>>>>>>>>> both incoming edges are critical or it gas more than 2 predecessors >>>>>>>>>>> and at least one incoming edge >>>>>>>>>>> is critical. This saved iterator can be used by extended phi >>>>>>>>>>> predication. >>>>>>>>>>> >>>>>>>>>>> Here is motivated test-case which explains this point. >>>>>>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2 >>>>>>>>>>> -ftree-loop-vectorize -fopenmp options. >>>>>>>>>>> The problem phi is in bb-7: >>>>>>>>>>> >>>>>>>>>>> bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 }) >>>>>>>>>>> { >>>>>>>>>>> <bb 5>: >>>>>>>>>>> xmax_edge_18 = xmax_edge_36 + 1; >>>>>>>>>>> if (xmax_17 == xmax_27) >>>>>>>>>>> goto <bb 7>; >>>>>>>>>>> else >>>>>>>>>>> goto <bb 9>; >>>>>>>>>>> >>>>>>>>>>> } >>>>>>>>>>> bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 }) >>>>>>>>>>> { >>>>>>>>>>> <bb 6>: >>>>>>>>>>> if (xmax_17 == xmax_27) >>>>>>>>>>> goto <bb 7>; >>>>>>>>>>> else >>>>>>>>>>> goto <bb 8>; >>>>>>>>>>> >>>>>>>>>>> } >>>>>>>>>>> bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 }) >>>>>>>>>>> { >>>>>>>>>>> <bb 7>: >>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)> >>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1; >>>>>>>>>>> goto <bb 11>; >>>>>>>>>>> >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> Note that both incoming edges to bb_7 are critical. If we comment >>>>>>>>>>> out >>>>>>>>>>> restoring gsi in predicate_all_scalar_phi: >>>>>>>>>>> #if 0 >>>>>>>>>>> if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb)) >>>>>>>>>>> || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb))) >>>>>>>>>>> gsi = bb_insert_point (bb); >>>>>>>>>>> else >>>>>>>>>>> #endif >>>>>>>>>>> gsi = gsi_after_labels (bb); >>>>>>>>>>> >>>>>>>>>>> we will get ICE: >>>>>>>>>>> t5.c: In function 'foo': >>>>>>>>>>> t5.c:9:6: error: definition in block 4 follows the use >>>>>>>>>>> void foo (int n) >>>>>>>>>>> ^ >>>>>>>>>>> for SSA_NAME: _1 in statement: >>>>>>>>>>> _52 = _1 & _3; >>>>>>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed >>>>>>>>>>> >>>>>>>>>>> smce predicate computations were inserted in bb_7. >>>>>>>>>> >>>>>>>>>> The issue is obviously that the predicates have already been emitted >>>>>>>>>> in the target BB - that's of course the wrong place. This is done >>>>>>>>>> by insert_gimplified_predicates. >>>>>>>>>> >>>>>>>>>> This just shows how edge predicate handling is broken - we don't >>>>>>>>>> seem to have a sequence of gimplified stmts for edge predicates >>>>>>>>>> but push those to e->dest which makes this really messy. >>>>>>>>>> >>>>>>>>>> Rather than having a separate phase where we insert all >>>>>>>>>> gimplified bb predicates we should do that on-demand when >>>>>>>>>> predicating a PHI. >>>>>>>>>> >>>>>>>>>> Your patch writes to stderr - that's bad - use dump_file and guard >>>>>>>>>> the printfs properly. >>>>>>>>>> >>>>>>>>>> You also still have two functions for PHI predication. And the >>>>>>>>>> new extended variant doesn't commonize the 2-args and general >>>>>>>>>> paths. >>>>>>>>>> >>>>>>>>>> I'm not at all happy with this code. It may be existing if-conv >>>>>>>>>> codes >>>>>>>>>> fault but making it even worse is not an option. >>>>>>>>>> >>>>>>>>>> Again - what's wrong with simply splitting critical edges if >>>>>>>>>> aggressive_if_conv? I think that would very much simplify >>>>>>>>>> things here. Or alternatively use gsi_insert_on_edge and >>>>>>>>>> commit edge insertions before merging the blocks. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Richard. >>>>>>>>>> >>>>>>>>>>> ChangeLog is >>>>>>>>>>> >>>>>>>>>>> 2014-12-09 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>> >>>>>>>>>>> * tree-if-conv.c : Include hash-map.h. >>>>>>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple >>>>>>>>>>> statement iterator. >>>>>>>>>>> (bb_insert_point): New function. >>>>>>>>>>> (set_bb_insert_point): New function. >>>>>>>>>>> (has_pred_critical_p): New function. >>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if >>>>>>>>>>> AGGRESSIVE_IF_CONV is true. >>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one >>>>>>>>>>> non-critical incoming edge. >>>>>>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED. >>>>>>>>>>> Allow interchange PHI arguments if EXTENDED is false. >>>>>>>>>>> Change check that block containing reduction statement candidate >>>>>>>>>>> is predecessor of phi-block since phi may have more than two >>>>>>>>>>> arguments. >>>>>>>>>>> (predicate_scalar_phi): Add new arguments for call of >>>>>>>>>>> is_cond_scalar_reduction. >>>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>>> (struct phi_args_hash_traits): New type. >>>>>>>>>>> (phi_args_hash_traits::hash): New function. >>>>>>>>>>> (phi_args_hash_traits::equal_keys): New function. >>>>>>>>>>> (gen_phi_arg_condition): New function. >>>>>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set >>>>>>>>>>> it >>>>>>>>>>> to true if BB containing phi has more than 2 predecessors or both >>>>>>>>>>> incoming edges are critical. Invoke find_phi_replacement_condition >>>>>>>>>>> and >>>>>>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB >>>>>>>>>>> has 2 predecessors and both incoming edges are critical or it has >>>>>>>>>>> more >>>>>>>>>>> than 2 predecessors and atleast one incoming edge is critical. >>>>>>>>>>> Use standard gsi_after_labels otherwise. >>>>>>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true. >>>>>>>>>>> (insert_gimplified_predicates): Add bool variable >>>>>>>>>>> EXTENDED_PREDICATION >>>>>>>>>>> to save gsi before insertion of predicate computations. SEt-up it to >>>>>>>>>>> true for BB with 2 predecessors and critical incoming edges either >>>>>>>>>>> number of predecessors is geater 2 and at least one >>>>>>>>>>> incoming edge is >>>>>>>>>>> critical. >>>>>>>>>>> Add check that non-predicated block may have statements to insert. >>>>>>>>>>> Insert predicate computation of BB just after label if >>>>>>>>>>> EXTENDED_PREDICATION is true. >>>>>>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which >>>>>>>>>>> is copy of inner or outer loop force_vectorize field. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener >>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev >>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>> Richard, >>>>>>>>>>>>> >>>>>>>>>>>>> I did simple change by saving gsi iterator for each bb that has >>>>>>>>>>>>> critical edges by adding additional field to bb_predicate_s: >>>>>>>>>>>>> >>>>>>>>>>>>> typedef struct bb_predicate_s { >>>>>>>>>>>>> >>>>>>>>>>>>> /* The condition under which this basic block is executed. */ >>>>>>>>>>>>> tree predicate; >>>>>>>>>>>>> >>>>>>>>>>>>> /* PREDICATE is gimplified, and the sequence of statements is >>>>>>>>>>>>> recorded here, in order to avoid the duplication of >>>>>>>>>>>>> computations >>>>>>>>>>>>> that occur in previous conditions. See PR44483. */ >>>>>>>>>>>>> gimple_seq predicate_gimplified_stmts; >>>>>>>>>>>>> >>>>>>>>>>>>> /* Insertion point for blocks having incoming critical edges. >>>>>>>>>>>>> */ >>>>>>>>>>>>> gimple_stmt_iterator gsi; >>>>>>>>>>>>> } *bb_predicate_p; >>>>>>>>>>>>> >>>>>>>>>>>>> and this iterator is saved in insert_gimplified_predicates before >>>>>>>>>>>>> insertion code for predicate computation. I checked that this fix >>>>>>>>>>>>> works. >>>>>>>>>>>> >>>>>>>>>>>> Huh? I still wonder what the issue is with inserting everything >>>>>>>>>>>> after the PHI we predicate. >>>>>>>>>>>> >>>>>>>>>>>> Well, your updated patch will come with testcases for the testsuite >>>>>>>>>>>> that will hopefully fail if doing that. >>>>>>>>>>>> >>>>>>>>>>>> Richard. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Now I am implementing merging of predicate_extended.. and >>>>>>>>>>>>> predicate_arbitrary.. functions as you proposed. >>>>>>>>>>>>> >>>>>>>>>>>>> Best regards. >>>>>>>>>>>>> Yuri. >>>>>>>>>>>>> >>>>>>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener >>>>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev >>>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>>> Thanks Richard for your quick reply! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and >>>>>>>>>>>>>>> predicate_arbitrary_ to one function as you proposed. >>>>>>>>>>>>>>> 2. What is your opinion about using more simple decision about >>>>>>>>>>>>>>> insertion point - if bb has use of phi result insert phi >>>>>>>>>>>>>>> predication >>>>>>>>>>>>>>> before it and at the bb end otherwise. I assume that critical >>>>>>>>>>>>>>> edge >>>>>>>>>>>>>>> splitting is not a good decision. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Why not always insert before the use? Which would be after >>>>>>>>>>>>>> labels, >>>>>>>>>>>>>> what we do for two-arg PHIs. That is, how can it be that you >>>>>>>>>>>>>> predicate >>>>>>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its >>>>>>>>>>>>>> incoming >>>>>>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself? That >>>>>>>>>>>>>> can only happen for backedges but those you can't remove in any >>>>>>>>>>>>>> case. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best regards. >>>>>>>>>>>>>>> Yuri. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener >>>>>>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev >>>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>>>>> Hi Richard, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes: >>>>>>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv. >>>>>>>>>>>>>>>>> 2. Use static cast for the first argument of >>>>>>>>>>>>>>>>> gimple_phi_arg_edge. >>>>>>>>>>>>>>>>> I also very sorry that I sent you bad patch. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Now let me answer on your questions related to second patch. >>>>>>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and >>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Let's consider the following simple test-case: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> #pragma omp simd safelen(8) >>>>>>>>>>>>>>>>> for (i=0; i<512; i++) >>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>> float t = a[i]; >>>>>>>>>>>>>>>>> if (t > 0.0f & t < 1.0e+17f) >>>>>>>>>>>>>>>>> if (c[i] != 0) /* c is integer array. */ >>>>>>>>>>>>>>>>> res += 1; >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> we can see the following phi node correspondent to res: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 >>>>>>>>>>>>>>>>> arguments only >>>>>>>>>>>>>>>>> and only one check can be used for phi predication (for >>>>>>>>>>>>>>>>> reduction in >>>>>>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't >>>>>>>>>>>>>>>>> do it >>>>>>>>>>>>>>>>> even if we sort all phi argument values since we still have >>>>>>>>>>>>>>>>> to produce >>>>>>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see >>>>>>>>>>>>>>>>> comments >>>>>>>>>>>>>>>>> for predicate_arbitrary_scalar_phi). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> How so? We can always use !(condition) for the "last" value, >>>>>>>>>>>>>>>> thus >>>>>>>>>>>>>>>> treat it as an 'else' case. That even works for >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as >>>>>>>>>>>>>>>> ! (condition for 3 || condition for 4). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first >>>>>>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its >>>>>>>>>>>>>>>> inversion >>>>>>>>>>>>>>>> used for edges 3 and 4 combined. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point? >>>>>>>>>>>>>>>>> Let's consider another test-case extracted from 175.vpr ( >>>>>>>>>>>>>>>>> t5.c is >>>>>>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi >>>>>>>>>>>>>>>>> nodes has >>>>>>>>>>>>>>>>> only critical incoming edges and both contain code computing >>>>>>>>>>>>>>>>> edge >>>>>>>>>>>>>>>>> predicates, e.g. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> <bb 7>: >>>>>>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)> >>>>>>>>>>>>>>>>> _46 = xmax_17 == xmax_37; >>>>>>>>>>>>>>>>> _47 = xmax_17 == xmax_27; >>>>>>>>>>>>>>>>> _48 = _46 & _47; >>>>>>>>>>>>>>>>> _53 = xmax_17 == xmax_37; >>>>>>>>>>>>>>>>> _54 = ~_53; >>>>>>>>>>>>>>>>> _55 = xmax_17 == xmax_27; >>>>>>>>>>>>>>>>> _56 = _54 & _55; >>>>>>>>>>>>>>>>> _57 = _48 | _56; >>>>>>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1; >>>>>>>>>>>>>>>>> goto <bb 11>; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> It is evident that we can not put phi predication at the block >>>>>>>>>>>>>>>>> beginning but need to put it after predicate computations. >>>>>>>>>>>>>>>>> Note also that if there are no critical edges for phi >>>>>>>>>>>>>>>>> arguments >>>>>>>>>>>>>>>>> insertion point will be "after labels" Note also that phi >>>>>>>>>>>>>>>>> result can >>>>>>>>>>>>>>>>> have use in this block too, so we can't put predication code >>>>>>>>>>>>>>>>> to the >>>>>>>>>>>>>>>>> block end. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> So the issue is that predicate insertion for edge predicates >>>>>>>>>>>>>>>> does >>>>>>>>>>>>>>>> not happen on the edge but somewhere else (generally impossible >>>>>>>>>>>>>>>> for critical edges unless you split them). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I think I've told you before that I prefer simple solutions to >>>>>>>>>>>>>>>> such issues, >>>>>>>>>>>>>>>> like splitting the edge! Certainly not involving a function >>>>>>>>>>>>>>>> walking >>>>>>>>>>>>>>>> GENERIC expressions. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Let me know if you still have any questions. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best regards. >>>>>>>>>>>>>>>>> Yuri. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener >>>>>>>>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev >>>>>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Here is the second patch related to extended predication. >>>>>>>>>>>>>>>>>>> Few comments which explain a main goal of design. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since >>>>>>>>>>>>>>>>>>> it may >>>>>>>>>>>>>>>>>>> lead to less efficient binaries. >>>>>>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was >>>>>>>>>>>>>>>>>>> introduced >>>>>>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are >>>>>>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI >>>>>>>>>>>>>>>>>>> conditional >>>>>>>>>>>>>>>>>>> scalar reduction is applied. >>>>>>>>>>>>>>>>>>> This is correspondent to the following statement: >>>>>>>>>>>>>>>>>>> if (q1 && q2 && q3) var++ >>>>>>>>>>>>>>>>>>> New function phi_has_two_different_args was introduced to >>>>>>>>>>>>>>>>>>> detect such phi. >>>>>>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption >>>>>>>>>>>>>>>>>>> that at >>>>>>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not >>>>>>>>>>>>>>>>>>> critical - it >>>>>>>>>>>>>>>>>>> guarantees that all computations related to predicate of >>>>>>>>>>>>>>>>>>> normal edge >>>>>>>>>>>>>>>>>>> are already inserted above this block and >>>>>>>>>>>>>>>>>>> code related to PHI predication can be inserted at the >>>>>>>>>>>>>>>>>>> beginning of >>>>>>>>>>>>>>>>>>> block. But this is not true for critical edges for which >>>>>>>>>>>>>>>>>>> predicate >>>>>>>>>>>>>>>>>>> computations are in the block where code for phi >>>>>>>>>>>>>>>>>>> predication must be >>>>>>>>>>>>>>>>>>> inserted. So new function find_insertion_point is >>>>>>>>>>>>>>>>>>> introduced which is >>>>>>>>>>>>>>>>>>> simply found out the last statement in block defining >>>>>>>>>>>>>>>>>>> predicates >>>>>>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi >>>>>>>>>>>>>>>>>>> predication code >>>>>>>>>>>>>>>>>>> after it (with some minor exceptions). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 >>>>>>>>>>>>>>>>>> @@ >>>>>>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> a few remarks nevertheless. I don't see how we need both >>>>>>>>>>>>>>>>>> predicate_extended_scalar_phi and >>>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi. >>>>>>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs >>>>>>>>>>>>>>>>>> after value >>>>>>>>>>>>>>>>>> and handle equal values specially in >>>>>>>>>>>>>>>>>> predicate_extended_scalar_phi? >>>>>>>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I don't understand the need for find_insertion_point. All >>>>>>>>>>>>>>>>>> SSA names >>>>>>>>>>>>>>>>>> required for the predicates are defined upward - and the >>>>>>>>>>>>>>>>>> complex CFG >>>>>>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will >>>>>>>>>>>>>>>>>> dominate the >>>>>>>>>>>>>>>>>> inserted code if you insert after labels just like for the >>>>>>>>>>>>>>>>>> other case. >>>>>>>>>>>>>>>>>> Or what am I missing? ("flattening" of the basic-blocks of >>>>>>>>>>>>>>>>>> course needs >>>>>>>>>>>>>>>>>> to happen in dominator order - but I guess that happens >>>>>>>>>>>>>>>>>> already?) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag >>>>>>>>>>>>>>>>>> even >>>>>>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args >>>>>>>>>>>>>>>>>> multiple >>>>>>>>>>>>>>>>>> times that would have been nice to vectorize. I suggest to >>>>>>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this. We can do >>>>>>>>>>>>>>>>>> this as >>>>>>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize >>>>>>>>>>>>>>>>>> flag >>>>>>>>>>>>>>>>>> to something less looking like a flag, like simply >>>>>>>>>>>>>>>>>> 'aggressive'. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Otherwise patch 2 looks ok to me. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ChangeLog: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 2014-10-24 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use >>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag. >>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 >>>>>>>>>>>>>>>>>>> predecessors if >>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true. >>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one >>>>>>>>>>>>>>>>>>> non-critical incoming edge. >>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function. >>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose >>>>>>>>>>>>>>>>>>> access >>>>>>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get >>>>>>>>>>>>>>>>>>> phi >>>>>>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block >>>>>>>>>>>>>>>>>>> containing reduction statement candidate is predecessor >>>>>>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments. >>>>>>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to >>>>>>>>>>>>>>>>>>> insert >>>>>>>>>>>>>>>>>>> statement before/after gsi point. >>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means >>>>>>>>>>>>>>>>>>> non-extended >>>>>>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add >>>>>>>>>>>>>>>>>>> argument >>>>>>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of >>>>>>>>>>>>>>>>>>> convert_scalar_cond_reduction. >>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function. >>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>>>>>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables >>>>>>>>>>>>>>>>>>> EXTENDED and >>>>>>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi >>>>>>>>>>>>>>>>>>> has more >>>>>>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. >>>>>>>>>>>>>>>>>>> Invoke >>>>>>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or >>>>>>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi >>>>>>>>>>>>>>>>>>> depending on >>>>>>>>>>>>>>>>>>> EXTENDED value. >>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that >>>>>>>>>>>>>>>>>>> non-predicated block >>>>>>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just >>>>>>>>>>>>>>>>>>> after label >>>>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true. >>>>>>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of >>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE which >>>>>>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.