On Tue, Mar 29, 2016 at 9:37 AM, Richard Biener <richard.guent...@gmail.com> wrote: > On Mon, Mar 28, 2016 at 9:57 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >> Sorry, Should have replied to gcc-patches list. >> >> Thanks, >> bin >> >> ---------- Forwarded message ---------- >> From: "Bin.Cheng" <amker.ch...@gmail.com> >> Date: Tue, 29 Mar 2016 03:55:04 +0800 >> Subject: Re: [PATCH PR69489/01]Improve tree ifcvt by storing/tracking >> DR against its innermost loop bahavior if possible >> To: Richard Biener <richard.guent...@gmail.com> >> >> On 3/17/16, Richard Biener <richard.guent...@gmail.com> wrote: >>> On Wed, Mar 16, 2016 at 5:17 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>>> On Wed, Mar 16, 2016 at 12:20 PM, Richard Biener >>>> <richard.guent...@gmail.com> wrote: >>>>> >>>>> Hmm. >>>> Hi, >>>> Thanks for reviewing. >>>>> >>>>> + equal_p = true; >>>>> + if (e1->base_address && e2->base_address) >>>>> + equal_p &= operand_equal_p (e1->base_address, e2->base_address, 0); >>>>> + if (e1->offset && e2->offset) >>>>> + equal_p &= operand_equal_p (e1->offset, e2->offset, 0); >>>>> >>>>> surely better to return false early. >>>>> >>>>> I think we don't want this in tree-data-refs.h also because of ... >>>>> >>>>> @@ -615,15 +619,29 @@ >>>>> hash_memrefs_baserefs_and_store_DRs_read_written_info >>>>> (data_reference_p a) >>>>> data_reference_p *master_dr, *base_master_dr;and REALPART) before >>>>> creating the DR (or adjust the equality function >>>> and hashing >>>>> tree ref = DR_REF (a); >>>>> tree base_ref = DR_BASE_OBJECT (a); >>>>> + innermost_loop_behavior *innermost = &DR_INNERMOST (a); >>>>> tree ca = bb_predicate (gimple_bb (DR_STMT (a))); >>>>> bool exist1, exist2; >>>>> >>>>> - while (TREE_CODE (ref) == COMPONENT_REF >>>>> - || TREE_CODE (ref) == IMAGPART_EXPR >>>>> - || TREE_CODE (ref) == REALPART_EXPR) >>>>> - ref = TREE_OPERAND (ref, 0); >>>>> + /* If reference in DR has innermost loop behavior and it is not >>>>> + a compound memory reference, we store it to innermost_DR_map, >>>>> + otherwise to ref_DR_map. */ >>>>> + if (TREE_CODE (ref) == COMPONENT_REF >>>>> + || TREE_CODE (ref) == IMAGPART_EXPR >>>>> + || TREE_CODE (ref) == REALPART_EXPR >>>>> + || !(DR_BASE_ADDRESS (a) || DR_OFFSET (a) >>>>> + || DR_INIT (a) || DR_STEP (a) || DR_ALIGNED_TO (a))) >>>>> + { >>>>> + while (TREE_CODE (ref) == COMPONENT_REF >>>>> + || TREE_CODE (ref) == IMAGPART_EXPR >>>>> + || TREE_CODE (ref) == REALPART_EXPR) >>>>> + ref = TREE_OPERAND (ref, 0); >>>>> + >>>>> + master_dr = &ref_DR_map->get_or_insert (ref, &exist1); >>>>> + } >>>>> + else >>>>> + master_dr = &innermost_DR_map->get_or_insert (innermost, &exist1); >>>>> >>>>> we don't want an extra hashmap but replace ref_DR_map entirely. So we'd >>>>> need to >>>>> strip outermost non-variant handled-components (COMPONENT_REF, IMAGPART >>>>> and REALPART) before creating the DR (or adjust the equality function >>>>> and hashing >>>>> to disregard them which means subtracting their offset from DR_INIT. >>>> I am not sure if I understand correctly. But for component reference, >>>> it is the base object that we want to record/track. For example, >>>> >>>> for (i = 0; i < N; i++) { >>>> m = *data++; >>>> >>>> m1 = p1->x - m; >>>> m2 = p2->x + m; >>>> >>>> p3->y = (m1 >= m2) ? p1->y : p2->y; >>>> >>>> p1++; >>>> p2++; >>>> p3++; >>>> } >>>> We want to infer that reads of p1/p2 in condition statement won't trap >>>> because there are unconditional reads of the structures, though the >>>> unconditional reads are actual of other sub-objects. Here it is the >>>> invariant part of address that we want to track. >>> >>> Well, the variant parts - we want to strip invariant parts as far as we can >>> (offsetof (x) and offsetof (y)) >>> >>>> Also illustrated by this example, we can't rely on data-ref analyzer >>>> here. Because in gathering/scattering cases, the address could be not >>>> affine at all. >>> >>> Sure, but that's a different issue. >>> >>>>> >>>>> To adjust the references we collect you'd maybe could use a callback >>>>> to get_references_in_stmt >>>>> to adjust them. >>>>> >>>>> OTOH post-processing the DRs in if_convertible_loop_p_1 can be as simple >>>>> as >>>> Is this a part of the method you suggested above, or is it an >>>> alternative one? If it's the latter, then I have below questions >>>> embedded. >>> >>> It is an alternative to adding a hook to get_references_in_stmt and >>> probably "easier". >>> >>>>> >>>>> Index: tree-if-conv.c >>>>> =================================================================== >>>>> --- tree-if-conv.c (revision 234215) >>>>> +++ tree-if-conv.c (working copy) >>>>> @@ -1235,6 +1220,38 @@ if_convertible_loop_p_1 (struct loop *lo >>>>> >>>>> for (i = 0; refs->iterate (i, &dr); i++) >>>>> { >>>>> + tree *refp = &DR_REF (dr); >>>>> + while ((TREE_CODE (*refp) == COMPONENT_REF >>>>> + && TREE_OPERAND (*refp, 2) == NULL_TREE) >>>>> + || TREE_CODE (*refp) == IMAGPART_EXPR >>>>> + || TREE_CODE (*refp) == REALPART_EXPR) >>>>> + refp = &TREE_OPERAND (*refp, 0); >>>>> + if (refp != &DR_REF (dr)) >>>>> + { >>>>> + tree saved_base = *refp; >>>>> + *refp = integer_zero_node; >>>>> + >>>>> + if (DR_INIT (dr)) >>>>> + { >>>>> + tree poffset; >>>>> + int punsignedp, preversep, pvolatilep; >>>>> + machine_mode pmode; >>>>> + HOST_WIDE_INT pbitsize, pbitpos; >>>>> + get_inner_reference (DR_REF (dr), &pbitsize, &pbitpos, >>>>> &poffset, >>>>> + &pmode, &punsignedp, &preversep, >>>>> &pvolatilep, >>>>> + false); >>>>> + gcc_assert (poffset == NULL_TREE); >>>>> + >>>>> + DR_INIT (dr) >>>>> + = wide_int_to_tree (ssizetype, >>>>> + wi::sub (DR_INIT (dr), >>>>> + pbitpos / BITS_PER_UNIT)); >>>>> + } >>>>> + >>>>> + *refp = saved_base; >>>>> + DR_REF (dr) = *refp; >>>>> + } >>>> Looks to me the code is trying to resolve difference between two (or >>>> more) component references, which is DR_INIT in the code. But DR_INIT >>>> is not the only thing needs to be handled. For a structure containing >>>> two sub-arrays, DR_OFFSET may be different too. >>> >>> Yes, but we can't say that if >>> >>> a->a[i] >>> >>> doesn't trap that then >>> >>> a->b[i] >>> >>> doesn't trap either. We can only "strip" outermost >>> non-variable-offset components. >>> >>> But maybe I'm missing what example you are thinking of. >> Hmm, this was the case I meant. What I don't understand is current >> code logic does infer trap information for a.b[i] from a.a[i]. Given >> below example: >> struct str >> { >> int a[10]; >> int b[20]; >> char c; >> }; >> >> void bar (struct str *); >> int foo (int x, int n) >> { >> int i; >> struct str s; >> bar (&s); >> for (i = 0; i < n; i++) >> { >> s.a[i] = s.b[i]; >> if (x > i) >> s.b[i] = 0; >> } >> bar (&s); >> return 0; >> } >> The loop is convertible because of below code in function >> ifcvt_memrefs_wont_trap: >> >> /* If a is unconditionally accessed then ... */ >> if (DR_RW_UNCONDITIONALLY (*master_dr)) >> { >> /* an unconditional read won't trap. */ >> if (DR_IS_READ (a)) >> return true; >> >> /* an unconditionaly write won't trap if the base is written >> to unconditionally. */ >> if (base_master_dr >> && DR_BASE_W_UNCONDITIONALLY (*base_master_dr)) >> return PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES); >> else >> { >> /* or the base is know to be not readonly. */ >> tree base_tree = get_base_address (DR_REF (a)); >> if (DECL_P (base_tree) >> && decl_binds_to_current_def_p (base_tree) >> && ! TREE_READONLY (base_tree)) >> return PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES); >> } >> } >> It is the main object '&s' that is recorded in base_master_dr, so >> s.b[i] is considered not trap. Even it's not, I suppose >> get_base_address will give same result? > > Well, for this case it sees that s.b[i] is read from so it can't be an > out-of-bound > access. And s.a[i] is written to unconditionally so 's' cannot be a > readonly object. > With both pieces of information we can conclude that s.b[i] = 0 doesn't trap.
Hi Richard, Attachment is the updated patch. I made below changes wrto your review comments: 1) Moved hash class for innermost loop behavior from tree-data-ref.h to tree-if-conv.c. I also removed check on innermost_loop_behavior.aligned_to which seems redundant to me because it's computed from offset and we have already checked equality for offset. 2) Replaced ref_DR_map with new map innermost_DR_map. 3) Post-processed DR in if_convertible_loop_p_1 for compound reference or references don't have innermost behavior. This cleans up following code using DR information. 4) Added a vectorization test for PR56625. I didn't incorporate your proposed code because I think it handles COMPONENT_REF of structure array like struct_arr[i].x/struct_arr[i].y. Looks to me it is another ifcvt opportunity different from PR69489. Anyway, fix is easy, I can send another patch in GCC7. Bootstrap and test on x86_64/AArch64, so any comments on this version? Thanks, bin 2016-03-30 Bin Cheng <bin.ch...@arm.com> PR tree-optimization/56625 PR tree-optimization/69489 * tree-data-ref.h (DR_INNERMOST): New macro. * tree-if-conv.c (innermost_loop_behavior_hash): New class for hashing struct innermost_loop_behavior. (ref_DR_map): Remove. (innermost_DR_map): New map. (baseref_DR_map): Revise comment. (hash_memrefs_baserefs_and_store_DRs_read_written_info): Store DR to innermost_DR_map accroding to its innermost loop behavior. (ifcvt_memrefs_wont_trap): Get DR from innermost_DR_map according to its innermost loop behavior. (if_convertible_loop_p_1): Remove intialization for ref_DR_map. Add initialization for innermost_DR_map. Record memory reference in DR_BASE_ADDRESS if the reference is compound one or it doesn't have innermost loop behavior. (if_convertible_loop_p): Remove release for ref_DR_map. Release innermost_DR_map. gcc/testsuite/ChangeLog 2016-03-30 Bin Cheng <bin.ch...@arm.com> PR tree-optimization/56625 PR tree-optimization/69489 * gcc.dg/vect/pr56625.c: New test. * gcc.dg/tree-ssa/ifc-pr69489-1.c: New test.
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h index eb24d16..856dd58 100644 --- a/gcc/tree-data-ref.h +++ b/gcc/tree-data-ref.h @@ -144,6 +144,7 @@ struct data_reference #define DR_STEP(DR) (DR)->innermost.step #define DR_PTR_INFO(DR) (DR)->alias.ptr_info #define DR_ALIGNED_TO(DR) (DR)->innermost.aligned_to +#define DR_INNERMOST(DR) (DR)->innermost typedef struct data_reference *data_reference_p; diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c index 9e305c7..884006c 100644 --- a/gcc/tree-if-conv.c +++ b/gcc/tree-if-conv.c @@ -114,16 +114,68 @@ along with GCC; see the file COPYING3. If not see #include "builtins.h" #include "params.h" +/* Hash for struct innermost_loop_behavior. It depends on the user to + free the memory. */ + +struct innermost_loop_behavior_hash : nofree_ptr_hash <innermost_loop_behavior> +{ + static inline hashval_t hash (const value_type &); + static inline bool equal (const value_type &, + const compare_type &); +}; + +inline hashval_t +innermost_loop_behavior_hash::hash (const value_type &e) +{ + hashval_t hash; + + hash = iterative_hash_expr (e->base_address, 0); + hash = iterative_hash_expr (e->offset, hash); + hash = iterative_hash_expr (e->init, hash); + return iterative_hash_expr (e->step, hash); +} + +inline bool +innermost_loop_behavior_hash::equal (const value_type &e1, + const compare_type &e2) +{ + if ((e1->base_address && !e2->base_address) + || (!e1->base_address && e2->base_address) + || (!e1->offset && e2->offset) + || (e1->offset && !e2->offset) + || (!e1->init && e2->init) + || (e1->init && !e2->init) + || (!e1->step && e2->step) + || (e1->step && !e2->step)) + return false; + + if (e1->base_address && e2->base_address + && !operand_equal_p (e1->base_address, e2->base_address, 0)) + return false; + if (e1->offset && e2->offset + && !operand_equal_p (e1->offset, e2->offset, 0)) + return false; + if (e1->init && e2->init + && !operand_equal_p (e1->init, e2->init, 0)) + return false; + if (e1->step && e2->step + && !operand_equal_p (e1->step, e2->step, 0)) + return false; + + return true; +} + /* List of basic blocks in if-conversion-suitable order. */ static basic_block *ifc_bbs; /* Apply more aggressive (extended) if-conversion if true. */ static bool aggressive_if_conv; -/* Hash table to store references, DR pairs. */ -static hash_map<tree_operand_hash, data_reference_p> *ref_DR_map; +/* Hash table to store <DR's innermost loop behavior, DR> pairs. */ +static hash_map<innermost_loop_behavior_hash, + data_reference_p> *innermost_DR_map; -/* Hash table to store base reference, DR pairs. */ +/* Hash table to store <base reference, DR> pairs. */ static hash_map<tree_operand_hash, data_reference_p> *baseref_DR_map; /* Structure used to predicate basic blocks. This is attached to the @@ -613,17 +665,12 @@ hash_memrefs_baserefs_and_store_DRs_read_written_info (data_reference_p a) { data_reference_p *master_dr, *base_master_dr; - tree ref = DR_REF (a); tree base_ref = DR_BASE_OBJECT (a); + innermost_loop_behavior *innermost = &DR_INNERMOST (a); tree ca = bb_predicate (gimple_bb (DR_STMT (a))); bool exist1, exist2; - while (TREE_CODE (ref) == COMPONENT_REF - || TREE_CODE (ref) == IMAGPART_EXPR - || TREE_CODE (ref) == REALPART_EXPR) - ref = TREE_OPERAND (ref, 0); - - master_dr = &ref_DR_map->get_or_insert (ref, &exist1); + master_dr = &innermost_DR_map->get_or_insert (innermost, &exist1); if (!exist1) *master_dr = a; @@ -685,21 +732,18 @@ ifcvt_memrefs_wont_trap (gimple *stmt, vec<data_reference_p> drs) data_reference_p *master_dr, *base_master_dr; data_reference_p a = drs[gimple_uid (stmt) - 1]; - tree ref_base_a = DR_REF (a); tree base = DR_BASE_OBJECT (a); + innermost_loop_behavior *innermost = &DR_INNERMOST (a); gcc_assert (DR_STMT (a) == stmt); + gcc_assert (DR_BASE_ADDRESS (a) || DR_OFFSET (a) + || DR_INIT (a) || DR_STEP (a)); - while (TREE_CODE (ref_base_a) == COMPONENT_REF - || TREE_CODE (ref_base_a) == IMAGPART_EXPR - || TREE_CODE (ref_base_a) == REALPART_EXPR) - ref_base_a = TREE_OPERAND (ref_base_a, 0); + master_dr = innermost_DR_map->get (innermost); + gcc_assert (master_dr != NULL); - master_dr = ref_DR_map->get (ref_base_a); base_master_dr = baseref_DR_map->get (base); - gcc_assert (master_dr != NULL); - /* If a is unconditionally written to it doesn't trap. */ if (DR_W_UNCONDITIONALLY (*master_dr)) return true; @@ -1228,13 +1272,16 @@ if_convertible_loop_p_1 (struct loop *loop, data_reference_p dr; - ref_DR_map = new hash_map<tree_operand_hash, data_reference_p>; + innermost_DR_map + = new hash_map<innermost_loop_behavior_hash, data_reference_p>; baseref_DR_map = new hash_map<tree_operand_hash, data_reference_p>; predicate_bbs (loop); for (i = 0; refs->iterate (i, &dr); i++) { + tree ref = DR_REF (dr); + dr->aux = XNEW (struct ifc_dr); DR_BASE_W_UNCONDITIONALLY (dr) = false; DR_RW_UNCONDITIONALLY (dr) = false; @@ -1244,6 +1291,27 @@ if_convertible_loop_p_1 (struct loop *loop, IFC_DR (dr)->base_w_predicate = boolean_false_node; if (gimple_uid (DR_STMT (dr)) == 0) gimple_set_uid (DR_STMT (dr), i + 1); + + /* If DR doesn't have innermost loop behavior or it's a compound + memory reference, we synthesize its innermost loop behavior + for hashing. */ + if (TREE_CODE (ref) == COMPONENT_REF + || TREE_CODE (ref) == IMAGPART_EXPR + || TREE_CODE (ref) == REALPART_EXPR + || !(DR_BASE_ADDRESS (dr) || DR_OFFSET (dr) + || DR_INIT (dr) || DR_STEP (dr))) + { + while (TREE_CODE (ref) == COMPONENT_REF + || TREE_CODE (ref) == IMAGPART_EXPR + || TREE_CODE (ref) == REALPART_EXPR) + ref = TREE_OPERAND (ref, 0); + + DR_BASE_ADDRESS (dr) = ref; + DR_OFFSET (dr) = NULL; + DR_INIT (dr) = NULL; + DR_STEP (dr) = NULL; + DR_ALIGNED_TO (dr) = NULL; + } hash_memrefs_baserefs_and_store_DRs_read_written_info (dr); } @@ -1338,8 +1406,8 @@ if_convertible_loop_p (struct loop *loop, bool *any_mask_load_store) free_data_refs (refs); - delete ref_DR_map; - ref_DR_map = NULL; + delete innermost_DR_map; + innermost_DR_map = NULL; delete baseref_DR_map; baseref_DR_map = NULL; diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-pr69489-1.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-pr69489-1.c new file mode 100644 index 0000000..02a6731 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-pr69489-1.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-ifcvt-stats" { target *-*-* } } */ + +void foo (int a[], int b[]) +{ + int i; + for (i = 0; i < 100; i++) + { + if (a[i] == 0) + a[i] = b[i]*4; + else + a[i] = b[i]*3; + } +} + +/* { dg-final { scan-tree-dump-times "Applying if-conversion" 1 "ifcvt" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/pr56625.c b/gcc/testsuite/gcc.dg/vect/pr56625.c new file mode 100644 index 0000000..b903be3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr56625.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ + +void foo (int a[], int b[]) +{ + int i; + for (i = 0; i < 100; i++) + { + if (a[i] == 0) + a[i] = b[i]*4; + else + a[i] = b[i]*3; + } +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */