On Tue, Aug 4, 2015 at 7:05 AM, Jeff Law <l...@redhat.com> wrote: > On 07/17/2015 01:57 PM, Abe wrote: >> >> Dear all, >> >> Relative to the previous submission of this same patch, the below >> corrects some minor spacing and/or indentation issues, >> misc. other formatting fixes, and makes the disabled vectorization tests >> be disabled via "xfail" rather than by adding spaces to >> deliberately cause the relevant scanned-for text to not be found by >> DejaGNU so as to prevent the DejaGNU line from being interpreted. >> >> The below is also based on a Git checkout that was rebased to the latest >> upstream check-in from today, >> so it should merge cleanly with trunk as of today. >> >> Regards, >> >> Abe >> >> >> >> >> >> >> >> >> 2015-06-12 Sebastian Pop <s....@samsung.com> >> Abe Skolnik <a.skol...@samsung.com> >> >> PR tree-optimization/46029 >> * tree-data-ref.c (struct data_ref_loc_d): Moved... >> (get_references_in_stmt): Exported. >> * tree-data-ref.h (struct data_ref_loc_d): ... here. >> (get_references_in_stmt): Declared. >> >> * tree-if-conv.c (struct ifc_dr): Removed. >> (IFC_DR): Removed. >> (DR_WRITTEN_AT_LEAST_ONCE): Removed. >> (DR_RW_UNCONDITIONALLY): Removed. >> (memrefs_read_or_written_unconditionally): Removed. >> (write_memrefs_written_at_least_once): Removed. >> (ifcvt_could_trap_p): Does not take refs parameter anymore. >> (ifcvt_memrefs_wont_trap): Removed. >> (has_non_addressable_refs): New. >> (if_convertible_gimple_assign_stmt_p): Call has_non_addressable_refs. >> Removed use of refs. >> (if_convertible_stmt_p): Removed use of refs. >> (if_convertible_gimple_assign_stmt_p): Same. >> (if_convertible_loop_p_1): Removed use of refs. Remove >> initialization >> of dr->aux, DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY. >> (insert_address_of): New. >> (create_scratchpad): New. >> (create_indirect_cond_expr): New. >> (predicate_mem_writes): Call create_indirect_cond_expr. Take an >> extra >> parameter for scratch_pad. >> (combine_blocks): Same. >> (tree_if_conversion): Same. >> >> testsuite/ >> * g++.dg/tree-ssa/ifc-pr46029.C: New. >> * gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel. >> * gcc.dg/tree-ssa/ifc-8.c: New. >> * gcc.dg/tree-ssa/ifc-9.c: New. >> * gcc.dg/tree-ssa/ifc-10.c: New. >> * gcc.dg/tree-ssa/ifc-11.c: New. >> * gcc.dg/tree-ssa/ifc-12.c: New. >> * gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: Disabled. > > > >> diff --git a/gcc/common.opt b/gcc/common.opt >> index 6b2ccbc..49f6b9f 100644 >> --- a/gcc/common.opt >> +++ b/gcc/common.opt >> @@ -1413,7 +1413,7 @@ Common Report Var(flag_tree_loop_if_convert) >> Init(-1) Optimization >> Convert conditional jumps in innermost loops to branchless equivalents >> >> ftree-loop-if-convert-stores >> -Common Report Var(flag_tree_loop_if_convert_stores) Optimization >> +Common Report Var(flag_tree_loop_if_convert_stores) Init(-1) Optimization >> Also if-convert conditional jumps containing memory writes >> >> ; -finhibit-size-directive inhibits output of .size for ELF. > > I don't see this change mentioned anywhere in the ChangeLog. That seems to > be a relatively common situation. I called out some of those issues, but > didn't try to catch them all. Please make sure all changes are reflected in > the ChangeLog. > > > > >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c >> b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c >> index f392fbe..775fcd5 100644 >> --- a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c > > This change isn't mentioned in the ChangeLog. > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c >> >> b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c >> index 875d2d3..fc69ca2 100644 >> --- a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c >> @@ -1,5 +1,5 @@ >> /* { dg-do compile } */ >> -/* { dg-options "-c -O2 -ftree-vectorize -fdump-tree-ifcvt-stats" { >> target *-*-* } } */ >> +/* { dg-options "-c -O2 -ftree-vectorize -ftree-loop-if-convert-stores >> -fdump-tree-ifcvt-stats" { target *-*-* } } */ > > ISTM this really should be two tests, one with this code as-is, another that > exactly matches the ffmpeg kernel. > > > > >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c >> b/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c >> index 11e9533..cbd3378 100644 >> --- a/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c >> +++ b/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c > > I don't see this mentioned in the ChangeLog. It also doesn't look like you > actually disabled the test. Obviously this will need to be addressed before > your patch could go in. > >> diff --git a/gcc/testsuite/gcc.target/i386/avx2-gather-6.c >> b/gcc/testsuite/gcc.target/i386/avx2-gather-6.c >> index 180b490..aedc66a 100644 >> --- a/gcc/testsuite/gcc.target/i386/avx2-gather-6.c >> +++ b/gcc/testsuite/gcc.target/i386/avx2-gather-6.c > > Not mentioned in the ChangeLog. xfail needs to be fixed. > > Similarly for the others were you added xfails. > > >> diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c >> index e6ff4ef..2a1e27d 100644 >> --- a/gcc/tree-data-ref.c >> +++ b/gcc/tree-data-ref.c >> @@ -4363,22 +4363,11 @@ compute_all_dependences (vec<data_reference_p> >> datarefs, >> return true; >> } >> >> -/* Describes a location of a memory reference. */ >> - >> -typedef struct data_ref_loc_d >> -{ >> - /* The memory reference. */ >> - tree ref; >> - >> - /* True if the memory reference is read. */ >> - bool is_read; >> -} data_ref_loc; > > Moving data_ref_loc_d could potentially be split out and merged in > immediately assuming we've agreed that you're likely going to need > data_ref_loc from within tree-data-ref.c and tree-if-conv.c. > > Similarly for exporting get_references_in_stmt. > > Picking out these things that are necessary prerequisites, but not part of > the core changes you need to make will make the overall review process > simpler. > > > >> +ifcvt_could_trap_p (gimple stmt) > > Presumably you're able to consider something with a vuse which does not > otherwise trap as non-trapping because of the use of the scratchpad? > > Perhaps you should clarify the comment to indicate what the "refined for the > needs of the if-conversion" actually means. Remember, someone other than > you might be reading this code in the future. > > >> @@ -1459,8 +1309,7 @@ is_cond_scalar_reduction (gimple phi, gimple >> *reduc, tree arg_0, tree arg_1, >> if (PHI_ARG_DEF_FROM_EDGE (header_phi, latch_e) != PHI_RESULT (phi)) >> return false; >> >> - if (gimple_code (stmt) != GIMPLE_ASSIGN >> - || gimple_has_volatile_ops (stmt)) >> + if (gimple_code (stmt) != GIMPLE_ASSIGN || gimple_has_volatile_ops >> (stmt)) >> return false; > > This looks like a gratutious change. It probably doesn't matter either way. > If you really want to make this change, please do so independent of your > patch. > > >> >> if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt))) >> @@ -1882,8 +1731,7 @@ insert_gimplified_predicates (loop_p loop, bool >> any_mask_load_store) >> stmts = bb_predicate_gimplified_stmts (bb); >> if (stmts) >> { >> - if (flag_tree_loop_if_convert_stores >> - || any_mask_load_store) >> + if (flag_tree_loop_if_convert_stores || any_mask_load_store) > > Similarly. Pure whitespace/formatting changes should be handled > independently. > >> @@ -1925,6 +1773,80 @@ mask_exists (int size, vec<int> vec) >> return -1; >> } >> >> +/* Inserts at position GSI a statement "ADDRESS_OF_AI = &AI;" and >> + returns the ADDRESS_OF_AI. */ >> + >> +static tree >> +insert_address_of (tree ai, gimple_stmt_iterator *gsi) > > Presumably we've verified ai is addressable prior to calling this routine > :-)
+static tree +insert_address_of (tree ai, gimple_stmt_iterator *gsi) +{ + tree addr_expr = build_fold_addr_expr (ai); + tree address_of_ai = create_tmp_reg (TREE_TYPE (addr_expr), "_ifc_"); + gimple addr_stmt = gimple_build_assign (address_of_ai, addr_expr); + + address_of_ai = make_ssa_name (address_of_ai, addr_stmt); + gimple_assign_set_lhs (addr_stmt, address_of_ai); + SSA_NAME_DEF_STMT (address_of_ai) = addr_stmt; please simply use gimple addr_stmt = gimple_build_assign (make_temp_ssa_name (TREE_TYPE (addr_expr), NULL, "ifc"), addr_expr); address_of_ai = gimple_assign_lhs (addr_stmt); maybe you can simplify other code as well in this way. As Jeff said you better made sure 'ai' is TREE_ADDRESSABLE (I suppose this is always the scratchpad). Otherwise you seriously confuse the alias machinery. So do, at the beginning of this function gcc_assert (TREE_ADDRESSABLE (get_base_address (ai))); > >> + >> +/* Insert at the beginning of the first basic block of the current >> + function the allocation on the stack of N_BYTES of memory and >> + return a pointer to this scratchpad memory. */ >> + >> +static tree >> +create_scratchpad (unsigned int n_bytes) >> +{ >> + basic_block bb = single_succ ( ENTRY_BLOCK_PTR_FOR_FN(cfun) ); > > Whitespace is wrong. Should be > = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)); +static tree +create_scratchpad (unsigned int n_bytes) +{ + basic_block bb = single_succ ( ENTRY_BLOCK_PTR_FOR_FN(cfun) ); + gimple_stmt_iterator gsi = gsi_after_labels (bb); + tree x = build_int_cst (integer_type_node, n_bytes - 1); + tree elt_type = char_type_node; + tree array_type = build_array_type (elt_type, build_index_type (x)); + tree base = create_tmp_reg (array_type, "scratch_pad"); + + return insert_address_of (base, &gsi); also use create_tmp_var and add TREE_ADDRESSABLE (base) = 1; note that the vectorizer might want to align the scratchpad and aligning stack locals can be expensive. So I wonder if using a global is better? (yeah, you get false dependencies in the CPU with threads again, so you could even use TLS vars...). Not sure if we really need to worry. > >> +/* Returns a memory reference to the pointer defined by the >> + conditional expression: pointer = cond ? &A[i] : scratch_pad; and >> + inserts this code at GSI. */ >> + >> +static tree >> +create_indirect_cond_expr (tree ai, tree cond, tree *scratch_pad, >> + gimple_stmt_iterator *gsi, bool swap) >> +{ >> + tree cond_expr; >> + tree pointer; >> + gimple pointer_stmt; >> + >> + /* address_of_ai = &A[i]; */ >> + tree address_of_ai = insert_address_of (ai, gsi); >> + >> + /* Allocate the scratch pad only once per function. */ >> + if (!*scratch_pad) >> + *scratch_pad = create_scratchpad (64); > > Obviously the hardcoded 64 needs to change. You can iterate over vector modes looking for the biggest one, also consider word_mode. >> + >> + /* pointer = cond ? address_of_ai : scratch_pad; */ >> + pointer = create_tmp_reg (TREE_TYPE (address_of_ai), "_ifc_"); See above - make_temp_ssa_name >> + cond_expr = build3 (COND_EXPR, TREE_TYPE (address_of_ai), >> + unshare_expr (cond), >> + swap ? *scratch_pad : address_of_ai, >> + swap ? address_of_ai : *scratch_pad); Don't use build3 but directly build the gimple assign with split ops. > Note that the types of scratch_pad and address_of_ai might be different. I > know that's allowed at the generic level, but I'm not sure that's allowed at > the gimple level. Do you need to do a pointer conversion to stay safe WRT > the gimple typesystem here? No, all pointer types are compatible. > > One of the things that's unclear to me is how this all interacts with the > alias oracle. What matters with type-based aliasing is the actual dereference. Of course points-to info is another story here - you better should make sure to make that available for the new pointer you create, otherwise you'll create runtime alias checks for no good reason in vectorization. > Richi? Comments about that? > > > Overall I like what I see and I think you're heading in the right direction. > There's some things we ought to break out and commit now which will keep the > review work easier in the long term. So I thought about the way you if-convert and the way to vectorize that. We end up with, say void foo (float *p) { ... char scratchpad[]; ... for (i...;;) { void *p = cond ? &p[i] : &scratchpad; *p = ...; } for example. Now to vectorize this with scatters (_only_ supported with AVX512!) you need to be able to reduce the address to a single base address plus a vector of scaled indexes. Obviously the scratchpad and p do not have the same base and for SFmode scatters you have only 4 bytes to represent the index. This means you won't be able to use scatters here at all! Also with AVX512 you have masked loads/stores at your disposal which are much better suited here (and already supported by if-conversion!) as they will be likely very much faster than gather/scatter operations. So ... what target are you targeting that has gather/scatter support for arbitrary unrelated objects and _not_ masked load/store support. That said - it looks like your way of doing if-conversion isn't a good one for vectorization. Richard. > JEff