On Fri, Jul 3, 2015 at 11:37 AM, Alan Lawrence <[email protected]> wrote:
> Abe wrote:
>
>> In other words, the problem about which I was concerned is not going to be
>> triggered by e.g. "if (c) x = ..."
>> which lacks an attached "else x = ..." in a multithreaded program without
>> enough locking just because 'x' is global/static.
>>
>> The only remaining case to consider is if some code being compiler takes
>> the address of something thread-local and then "gives"
>> that pointer to another thread. Even for _that_ extreme case, Sebastian
>> says that the gimplifier will detect this
>> "address has been taken" situation and do the right thing such that the
>> new if converter also does the right thing.
>
>
> Great :). I don't understand much/anything about how gcc deals with
> thread-locals, but everything before that, all sounds good...
>
>> [Alan wrote:]
>>
>>> Can you give an example?
>>
>>
>> The test cases in the GCC tree at "gcc.dg/vect/pr61194.c" and
>> "gcc.dg/vect/vect-mask-load-1.c"
>> currently test as: the new if-converter is "converting" something that`s
>> already vectorizer-friendly...
>
>> [snip]
>>
>> However, TTBOMK the vectorizer already "understands" that in cases where
>> its input looks like:
>>
>> x = c ? y : z;
>>
>> ... and 'y' and 'z' are both pure [side-effect-free] -- including, but not
>> limited to, they must be non-"volatile" --
>> it may vectorize a loop containing code like the preceding, ignoring for
>> this particular instance the C mandate
>> that only one of {y, z} be evaluated...
>
>
> My understanding, is that any decision as to whether one or both of y or z
> is evaluated (when 'evaluation' involves doing any work, e.g. a load), has
> already been encoded into the gimple/tree IR. Thus, if we are to only
> evaluate one of 'y' or 'z' in your example, the IR will (prior to
> if-conversion), contain basic blocks and control flow, that means we jump
> around the one that's not evaluated.
>
> This appears to be the case in pr61194.c: prior to if-conversion, the IR for
> the loop in barX is
>
> <bb 3>:
> # i_16 = PHI <i_13(7), 0(2)>
> # ivtmp_21 = PHI <ivtmp_20(7), 1024(2)>
> _5 = x[i_16];
> _6 = _5 > 0.0;
> _7 = w[i_16];
> _8 = _7 < 0.0;
> _9 = _6 & _8;
> if (_9 != 0)
> goto <bb 4>;
> else
> goto <bb 5>;
>
> <bb 4>:
> iftmp.0_10 = z[i_16];
> goto <bb 6>;
>
> <bb 5>:
> iftmp.0_11 = y[i_16];
>
> <bb 6>:
> # iftmp.0_2 = PHI <iftmp.0_10(4), iftmp.0_11(5)>
> z[i_16] = iftmp.0_2;
> i_13 = i_16 + 1;
> ivtmp_20 = ivtmp_21 - 1;
> if (ivtmp_20 != 0)
> goto <bb 7>;
> else
> goto <bb 8>;
>
> <bb 7>:
> goto <bb 3>;
>
> which clearly contains (unvectorizable!) control flow. Without
> -ftree-loop-if-convert-stores, if-conversion leaves this alone, and
> vectorization fails (i.e. the vectorizer bails out because the loop has >2
> basic blocks). With -ftree-loop-if-convert-stores, if-conversion produces
>
> <bb 3>:
> # i_16 = PHI <i_13(4), 0(2)>
> # ivtmp_21 = PHI <ivtmp_20(4), 1024(2)>
> _5 = x[i_16];
> _6 = _5 > 0.0;
> _7 = w[i_16];
> _8 = _7 < 0.0;
> _9 = _6 & _8;
> iftmp.0_10 = z[i_16]; // <== here
> iftmp.0_11 = y[i_16]; // <== here
> iftmp.0_2 = _9 ? iftmp.0_10 : iftmp.0_11;
> z[i_16] = iftmp.0_2;
> i_13 = i_16 + 1;
> ivtmp_20 = ivtmp_21 - 1;
> if (ivtmp_20 != 0)
> goto <bb 4>;
> else
> goto <bb 5>;
>
> <bb 4>:
> goto <bb 3>;
>
> where I have commented the conditional loads that have become unconditional.
> (Hence, "-ftree-loop-if-convert-stores" looks misnamed - it affects how the
> if-conversion phase converts loads, too - please correct me if I
> misunderstand (Richard?) ?!) This contains no control flow, and so is
> vectorizable.
Not sure why the above is only handled with -ftree-loop-if-convert-stores. It's
clearly if-converting the load as the store is unconditional. Ok, the dump says
iftmp.0_10 = z[i_16];
tree could trap...
from
if (gimple_assign_rhs_could_trap_p (stmt))
{
if (ifcvt_can_use_mask_load_store (stmt))
{
gimple_set_plf (stmt, GF_PLF_2, true);
*any_mask_load_store = true;
return true;
}
if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "tree could trap...\n");
return false;
I suppose that should use ifcvt_could_trap_p instead (clearly we write to
z[i_16] unconditionally after the conditional load). Note that it says
the access may trap because the check is somewhat too conservative,
not using range-info of i_16.
Looks like to do this we'd need to enable some analysis code
currently run conditional on flag_tree_loop_if_convert_stores only.
Index: gcc/tree-if-conv.c
===================================================================
--- gcc/tree-if-conv.c (revision 225599)
+++ gcc/tree-if-conv.c (working copy)
@@ -874,7 +874,7 @@ if_convertible_gimple_assign_stmt_p (gim
return true;
}
- if (gimple_assign_rhs_could_trap_p (stmt))
+ if (ifcvt_could_trap_p (stmt, refs))
{
if (ifcvt_can_use_mask_load_store (stmt))
{
@@ -1297,18 +1297,15 @@ if_convertible_loop_p_1 (struct loop *lo
}
}
- if (flag_tree_loop_if_convert_stores)
- {
- data_reference_p dr;
+ data_reference_p dr;
- for (i = 0; refs->iterate (i, &dr); i++)
- {
- dr->aux = XNEW (struct ifc_dr);
- DR_WRITTEN_AT_LEAST_ONCE (dr) = -1;
- DR_RW_UNCONDITIONALLY (dr) = -1;
- }
- predicate_bbs (loop);
+ for (i = 0; refs->iterate (i, &dr); i++)
+ {
+ dr->aux = XNEW (struct ifc_dr);
+ DR_WRITTEN_AT_LEAST_ONCE (dr) = -1;
+ DR_RW_UNCONDITIONALLY (dr) = -1;
}
+ predicate_bbs (loop);
for (i = 0; i < loop->num_nodes; i++)
{
@@ -1323,9 +1320,8 @@ if_convertible_loop_p_1 (struct loop *lo
return false;
}
- if (flag_tree_loop_if_convert_stores)
- for (i = 0; i < loop->num_nodes; i++)
- free_bb_predicate (ifc_bbs[i]);
+ for (i = 0; i < loop->num_nodes; i++)
+ free_bb_predicate (ifc_bbs[i]);
/* Checking PHIs needs to be done after stmts, as the fact whether there
are any masked loads or stores affects the tests. */
@@ -1399,14 +1395,10 @@ if_convertible_loop_p (struct loop *loop
res = if_convertible_loop_p_1 (loop, &loop_nest, &refs, &ddrs,
any_mask_load_store);
- if (flag_tree_loop_if_convert_stores)
- {
- data_reference_p dr;
- unsigned int i;
-
- for (i = 0; refs.iterate (i, &dr); i++)
- free (dr->aux);
- }
+ data_reference_p dr;
+ unsigned int i;
+ for (i = 0; refs.iterate (i, &dr); i++)
+ free (dr->aux);
free_data_refs (refs);
free_dependence_relations (ddrs);
fixes the testcase for me.
Richard.
>
> (This is all without your scratchpad patch, of course.) IOW this being
> vectorized, or not, relies upon the preceding if-conversion phase removing
> the control flow.
>
> HTH
> Alan
>