On Tue, Aug 4, 2015 at 7:05 AM, Jeff Law <l...@redhat.com> wrote:
> On 07/17/2015 01:57 PM, Abe wrote:
>>
>> Dear all,
>>
>> Relative to the previous submission of this same patch, the below
>> corrects some minor spacing and/or indentation issues,
>> misc. other formatting fixes, and makes the disabled vectorization tests
>> be disabled via "xfail" rather than by adding spaces to
>> deliberately cause the relevant scanned-for text to not be found by
>> DejaGNU so as to prevent the DejaGNU line from being interpreted.
>>
>> The below is also based on a Git checkout that was rebased to the latest
>> upstream check-in from today,
>> so it should merge cleanly with trunk as of today.
>>
>> Regards,
>>
>> Abe
>>
>>
>>
>>
>>
>>
>>
>>
>> 2015-06-12  Sebastian Pop  <s....@samsung.com>
>>              Abe Skolnik  <a.skol...@samsung.com>
>>
>>      PR tree-optimization/46029
>>      * tree-data-ref.c (struct data_ref_loc_d): Moved...
>>      (get_references_in_stmt): Exported.
>>      * tree-data-ref.h (struct data_ref_loc_d): ... here.
>>      (get_references_in_stmt): Declared.
>>
>>      * tree-if-conv.c (struct ifc_dr): Removed.
>>      (IFC_DR): Removed.
>>      (DR_WRITTEN_AT_LEAST_ONCE): Removed.
>>      (DR_RW_UNCONDITIONALLY): Removed.
>>      (memrefs_read_or_written_unconditionally): Removed.
>>      (write_memrefs_written_at_least_once): Removed.
>>      (ifcvt_could_trap_p): Does not take refs parameter anymore.
>>      (ifcvt_memrefs_wont_trap): Removed.
>>      (has_non_addressable_refs): New.
>>      (if_convertible_gimple_assign_stmt_p): Call has_non_addressable_refs.
>>      Removed use of refs.
>>      (if_convertible_stmt_p): Removed use of refs.
>>      (if_convertible_gimple_assign_stmt_p): Same.
>>      (if_convertible_loop_p_1): Removed use of refs.  Remove
>> initialization
>>      of dr->aux, DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY.
>>      (insert_address_of): New.
>>      (create_scratchpad): New.
>>      (create_indirect_cond_expr): New.
>>      (predicate_mem_writes): Call create_indirect_cond_expr.  Take an
>> extra
>>      parameter for scratch_pad.
>>      (combine_blocks): Same.
>>      (tree_if_conversion): Same.
>>
>>      testsuite/
>>      * g++.dg/tree-ssa/ifc-pr46029.C: New.
>>      * gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel.
>>      * gcc.dg/tree-ssa/ifc-8.c: New.
>>      * gcc.dg/tree-ssa/ifc-9.c: New.
>>      * gcc.dg/tree-ssa/ifc-10.c: New.
>>      * gcc.dg/tree-ssa/ifc-11.c: New.
>>      * gcc.dg/tree-ssa/ifc-12.c: New.
>>      * gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: Disabled.
>
>
>
>> diff --git a/gcc/common.opt b/gcc/common.opt
>> index 6b2ccbc..49f6b9f 100644
>> --- a/gcc/common.opt
>> +++ b/gcc/common.opt
>> @@ -1413,7 +1413,7 @@ Common Report Var(flag_tree_loop_if_convert)
>> Init(-1) Optimization
>>   Convert conditional jumps in innermost loops to branchless equivalents
>>
>>   ftree-loop-if-convert-stores
>> -Common Report Var(flag_tree_loop_if_convert_stores) Optimization
>> +Common Report Var(flag_tree_loop_if_convert_stores) Init(-1) Optimization
>>   Also if-convert conditional jumps containing memory writes
>>
>>   ; -finhibit-size-directive inhibits output of .size for ELF.
>
> I don't see this change mentioned anywhere in the ChangeLog.  That seems to
> be a relatively common situation.  I called out some of those issues, but
> didn't try to catch them all.  Please make sure all changes are reflected in
> the ChangeLog.
>
>
>
>
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c
>> b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c
>> index f392fbe..775fcd5 100644
>> --- a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c
>
> This change isn't mentioned in the ChangeLog.
>
>
>  diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
>>
>> b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
>> index 875d2d3..fc69ca2 100644
>> --- a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
>> @@ -1,5 +1,5 @@
>>   /* { dg-do compile } */
>> -/* { dg-options "-c -O2 -ftree-vectorize -fdump-tree-ifcvt-stats" {
>> target *-*-* } } */
>> +/* { dg-options "-c -O2 -ftree-vectorize -ftree-loop-if-convert-stores
>> -fdump-tree-ifcvt-stats" { target *-*-* } } */
>
> ISTM this really should be two tests, one with this code as-is, another that
> exactly matches the ffmpeg kernel.
>
>
>
>
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c
>> b/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c
>> index 11e9533..cbd3378 100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c
>
> I don't see this mentioned in the ChangeLog.  It also doesn't look like you
> actually disabled the test.  Obviously this will need to be addressed before
> your patch could go in.
>
>> diff --git a/gcc/testsuite/gcc.target/i386/avx2-gather-6.c
>> b/gcc/testsuite/gcc.target/i386/avx2-gather-6.c
>> index 180b490..aedc66a 100644
>> --- a/gcc/testsuite/gcc.target/i386/avx2-gather-6.c
>> +++ b/gcc/testsuite/gcc.target/i386/avx2-gather-6.c
>
> Not mentioned in the ChangeLog.   xfail needs to be fixed.
>
> Similarly for the others were you added xfails.
>
>
>> diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
>> index e6ff4ef..2a1e27d 100644
>> --- a/gcc/tree-data-ref.c
>> +++ b/gcc/tree-data-ref.c
>> @@ -4363,22 +4363,11 @@ compute_all_dependences (vec<data_reference_p>
>> datarefs,
>>     return true;
>>   }
>>
>> -/* Describes a location of a memory reference.  */
>> -
>> -typedef struct data_ref_loc_d
>> -{
>> -  /* The memory reference.  */
>> -  tree ref;
>> -
>> -  /* True if the memory reference is read.  */
>> -  bool is_read;
>> -} data_ref_loc;
>
> Moving data_ref_loc_d could potentially be split out and merged in
> immediately assuming we've agreed that you're likely going to need
> data_ref_loc from within tree-data-ref.c and tree-if-conv.c.
>
> Similarly for exporting get_references_in_stmt.
>
> Picking out these things that are necessary prerequisites, but not part of
> the core changes you need to make will make the overall review process
> simpler.
>
>
>
>> +ifcvt_could_trap_p (gimple stmt)
>
> Presumably you're able to consider something with a vuse which does not
> otherwise trap as non-trapping because of the use of the scratchpad?
>
> Perhaps you should clarify the comment to indicate what the "refined for the
> needs of the if-conversion" actually means.  Remember, someone other than
> you might be reading this code in the future.
>
>
>> @@ -1459,8 +1309,7 @@ is_cond_scalar_reduction (gimple phi, gimple
>> *reduc, tree arg_0, tree arg_1,
>>     if (PHI_ARG_DEF_FROM_EDGE (header_phi, latch_e) != PHI_RESULT (phi))
>>       return false;
>>
>> -  if (gimple_code (stmt) != GIMPLE_ASSIGN
>> -      || gimple_has_volatile_ops (stmt))
>> +  if (gimple_code (stmt) != GIMPLE_ASSIGN || gimple_has_volatile_ops
>> (stmt))
>>       return false;
>
> This looks like a gratutious change.  It probably doesn't matter either way.
> If you really want to make this change, please do so independent of your
> patch.
>
>
>>
>>     if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
>> @@ -1882,8 +1731,7 @@ insert_gimplified_predicates (loop_p loop, bool
>> any_mask_load_store)
>>         stmts = bb_predicate_gimplified_stmts (bb);
>>         if (stmts)
>>       {
>> -      if (flag_tree_loop_if_convert_stores
>> -          || any_mask_load_store)
>> +      if (flag_tree_loop_if_convert_stores || any_mask_load_store)
>
> Similarly.  Pure whitespace/formatting changes should be handled
> independently.
>
>> @@ -1925,6 +1773,80 @@ mask_exists (int size, vec<int> vec)
>>     return -1;
>>   }
>>
>> +/* Inserts at position GSI a statement "ADDRESS_OF_AI = &AI;" and
>> +   returns the ADDRESS_OF_AI.  */
>> +
>> +static tree
>> +insert_address_of (tree ai, gimple_stmt_iterator *gsi)
>
> Presumably we've verified ai is addressable prior to calling this routine
> :-)

+static tree
+insert_address_of (tree ai, gimple_stmt_iterator *gsi)
+{
+  tree addr_expr = build_fold_addr_expr (ai);
+  tree address_of_ai = create_tmp_reg (TREE_TYPE (addr_expr), "_ifc_");
+  gimple addr_stmt = gimple_build_assign (address_of_ai, addr_expr);
+
+  address_of_ai = make_ssa_name (address_of_ai, addr_stmt);
+  gimple_assign_set_lhs (addr_stmt, address_of_ai);
+  SSA_NAME_DEF_STMT (address_of_ai) = addr_stmt;

please simply use

  gimple addr_stmt = gimple_build_assign (make_temp_ssa_name
(TREE_TYPE (addr_expr), NULL, "ifc"), addr_expr);
  address_of_ai = gimple_assign_lhs (addr_stmt);

maybe you can simplify other code as well in this way.

As Jeff said you better made sure 'ai' is TREE_ADDRESSABLE (I suppose
this is always
the scratchpad).  Otherwise you seriously confuse the alias machinery.
So do, at the
beginning of this function

  gcc_assert (TREE_ADDRESSABLE (get_base_address (ai)));

>
>> +
>> +/* Insert at the beginning of the first basic block of the current
>> +   function the allocation on the stack of N_BYTES of memory and
>> +   return a pointer to this scratchpad memory.  */
>> +
>> +static tree
>> +create_scratchpad (unsigned int n_bytes)
>> +{
>> +  basic_block bb = single_succ ( ENTRY_BLOCK_PTR_FOR_FN(cfun) );
>
> Whitespace is wrong.  Should be
>  = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun));

+static tree
+create_scratchpad (unsigned int n_bytes)
+{
+  basic_block bb = single_succ ( ENTRY_BLOCK_PTR_FOR_FN(cfun) );
+  gimple_stmt_iterator gsi = gsi_after_labels (bb);
+  tree x = build_int_cst (integer_type_node, n_bytes - 1);
+  tree elt_type = char_type_node;
+  tree array_type = build_array_type (elt_type, build_index_type (x));
+  tree base = create_tmp_reg (array_type, "scratch_pad");
+
+  return insert_address_of (base, &gsi);

also use create_tmp_var and add

 TREE_ADDRESSABLE (base) = 1;

note that the vectorizer might want to align the scratchpad and aligning
stack locals can be expensive.  So I wonder if using a global is better?
(yeah, you get false dependencies in the CPU with threads again, so
you could even use TLS vars...).  Not sure if we really need to worry.

>
>> +/* Returns a memory reference to the pointer defined by the
>> +   conditional expression: pointer = cond ? &A[i] : scratch_pad; and
>> +   inserts this code at GSI.  */
>> +
>> +static tree
>> +create_indirect_cond_expr (tree ai, tree cond, tree *scratch_pad,
>> +               gimple_stmt_iterator *gsi, bool swap)
>> +{
>> +  tree cond_expr;
>> +  tree pointer;
>> +  gimple pointer_stmt;
>> +
>> +  /* address_of_ai = &A[i];  */
>> +  tree address_of_ai = insert_address_of (ai, gsi);
>> +
>> +  /* Allocate the scratch pad only once per function.  */
>> +  if (!*scratch_pad)
>> +    *scratch_pad = create_scratchpad (64);
>
> Obviously the hardcoded 64 needs to change.

You can iterate over vector modes looking for the biggest one, also
consider word_mode.

>> +
>> +  /* pointer = cond ? address_of_ai : scratch_pad;  */
>> +  pointer = create_tmp_reg (TREE_TYPE (address_of_ai), "_ifc_");

See above - make_temp_ssa_name

>> +  cond_expr = build3 (COND_EXPR, TREE_TYPE (address_of_ai),
>> +              unshare_expr (cond),
>> +              swap ? *scratch_pad  : address_of_ai,
>> +              swap ? address_of_ai : *scratch_pad);

Don't use build3 but directly build the gimple assign with split ops.

> Note that the types of scratch_pad and address_of_ai might be different.  I
> know that's allowed at the generic level, but I'm not sure that's allowed at
> the gimple level.  Do you need to do a pointer conversion to stay safe WRT
> the gimple typesystem here?

No, all pointer types are compatible.

>
> One of the things that's unclear to me is how this all interacts with the
> alias oracle.

What matters with type-based aliasing is the actual dereference.  Of course
points-to info is another story here - you better should make sure to make
that available for the new pointer you create, otherwise you'll create runtime
alias checks for no good reason in vectorization.

> Richi?  Comments about that?
>
>
> Overall I like what I see and I think you're heading in the right direction.
> There's some things we ought to break out and commit now which will keep the
> review work easier in the long term.

So I thought about the way you if-convert and the way to vectorize
that.  We end up
with, say

void foo (float *p)
{
...
  char scratchpad[];
...
  for (i...;;)
    {
       void *p = cond ? &p[i] : &scratchpad;
       *p = ...;
    }

for example.  Now to vectorize this with scatters (_only_ supported
with AVX512!)
you need to be able to reduce the address to a single base address plus a
vector of scaled indexes.  Obviously the scratchpad and p do not have the same
base and for SFmode scatters you have only 4 bytes to represent the index.
This means you won't be able to use scatters here at all!

Also with AVX512 you have masked loads/stores at your disposal which are
much better suited here (and already supported by if-conversion!) as they
will be likely very much faster than gather/scatter operations.

So ... what target are you targeting that has gather/scatter support for
arbitrary unrelated objects and _not_ masked load/store support.

That said - it looks like your way of doing if-conversion isn't a good one for
vectorization.

Richard.


> JEff

Reply via email to