Re: [PATCH][RFC] tree-optimization/92335 - Improve sinking heuristics for vectorization

Prathamesh Kulkarni via Gcc-patches Sun, 06 Aug 2023 17:05:05 -0700

On Thu, 3 Aug 2023 at 17:48, Richard Biener <rguent...@suse.de> wrote:
>
> On Thu, 3 Aug 2023, Richard Biener wrote:
>
> > On Thu, 3 Aug 2023, Richard Biener wrote:
> >
> > > On Thu, 3 Aug 2023, Prathamesh Kulkarni wrote:
> > >
> > > > On Wed, 2 Aug 2023 at 14:17, Richard Biener via Gcc-patches
> > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > >
> > > > > On Mon, 31 Jul 2023, Jeff Law wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > > On 7/28/23 01:05, Richard Biener via Gcc-patches wrote:
> > > > > > > The following delays sinking of loads within the same innermost
> > > > > > > loop when it was unconditional before.  That's a not uncommon
> > > > > > > issue preventing vectorization when masked loads are not 
> > > > > > > available.
> > > > > > >
> > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > > > > > >
> > > > > > > I have a followup patch improving sinking that without this would
> > > > > > > cause more of the problematic sinking - now that we have a second
> > > > > > > sink pass after loop opts this looks like a reasonable approach?
> > > > > > >
> > > > > > > OK?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Richard.
> > > > > > >
> > > > > > >  PR tree-optimization/92335
> > > > > > >  * tree-ssa-sink.cc (select_best_block): Before loop
> > > > > > >  optimizations avoid sinking unconditional loads/stores
> > > > > > >  in innermost loops to conditional executed places.
> > > > > > >
> > > > > > >  * gcc.dg/tree-ssa/ssa-sink-10.c: Disable vectorizing.
> > > > > > >  * gcc.dg/tree-ssa/predcom-9.c: Clone from ssa-sink-10.c,
> > > > > > >  expect predictive commoning to happen instead of sinking.
> > > > > > >  * gcc.dg/vect/pr65947-3.c: Adjust.
> > > > > > I think it's reasonable -- there's probably going to be cases where 
> > > > > > it's not
> > > > > > great, but more often than not I think it's going to be a reasonable
> > > > > > heuristic.
> > > > > >
> > > > > > If there is undesirable fallout, better to find it over the coming 
> > > > > > months than
> > > > > > next spring.  So I'd suggest we go forward now to give more time to 
> > > > > > find any
> > > > > > pathological cases (if they exist).
> > > > >
> > > > > Agreed, I've pushed this now.
> > > > Hi Richard,
> > > > After this patch (committed in 
> > > > 399c8dd44ff44f4b496223c7cc980651c4d6f6a0),
> > > > pr65947-7.c "failed" for aarch64-linux-gnu:
> > > > FAIL: gcc.dg/vect/pr65947-7.c scan-tree-dump-not vect "LOOP VECTORIZED"
> > > > FAIL: gcc.dg/vect/pr65947-7.c -flto -ffat-lto-objects
> > > > scan-tree-dump-not vect "LOOP VECTORIZED"
> > > >
> > > > /* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target {
> > > > ! vect_fold_extract_last } } } } */
> > > >
> > > > With your commit, condition_reduction in pr65947-7.c gets vectorized
> > > > regardless of vect_fold_extract_last,
> > > > which gates the above test (which is an improvement, because the
> > > > function didn't get vectorized before the commit).
> > > >
> > > > The attached patch thus removes the gating on vect_fold_extract_last,
> > > > and the test passes again.
> > > > OK to commit ?
> > >
> > > OK.
> >
> > Or wait - the loop doesn't vectorize on x86_64, so I guess one
> > critical target condition is missing.  Can you figure out which?
>
> I see
>
> /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> note:   vect_is_simple_use: operand last_19 = PHI <last_8(7), 108(15)>,
> type of def: reduction
> /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> note:   vect_is_simple_use: vectype vector(4) int
> /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> missed:   multiple types in double reduction or condition reduction or
> fold-left reduction.
> /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:13:1:
> missed:   not vectorized: relevant phi not supported: last_19 = PHI
> <last_8(7), 108(15)>
> /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> missed:  bad operation or unsupported loop bound.
Hi Richard,
Looking at the aarch64 vect dump, it seems the loop in
condition_reduction gets vectorized with V4HI mode
while fails for other modes in vectorizable_condition:


  if ((double_reduc || reduction_type != TREE_CODE_REDUCTION)
      && ncopies > 1)
    {
      if (dump_enabled_p ())
        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                         "multiple types in double reduction or condition "
                         "reduction or fold-left reduction.\n");
      return false;
    }

>From the dump:
foo.c:9:21: note:   === vect_analyze_loop_operations ===
foo.c:9:21: note:   examining phi: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   vect_is_simple_use: operand (int) aval_13, type of
def: internal
foo.c:9:21: note:   vect_is_simple_use: vectype vector(4) int
foo.c:9:21: note:   vect_is_simple_use: operand last_19 = PHI
<last_8(7), 108(15)>, type of def: reduction
foo.c:9:21: note:   vect_is_simple_use: vectype vector(4) int

For V8HI, VF = 8, and vectype_in = vector(4) int.
Thus ncopies = VF / length(vectype_in) = 2, which is greater than 1,
and thus fails:
foo.c:9:21: missed:   multiple types in double reduction or condition
reduction or fold-left reduction.
foo.c:4:1: missed:   not vectorized: relevant phi not supported:
last_19 = PHI <last_8(7), 108(15)>
While for V4HI, VF = 4 and thus ncopies = 1, so it succeeds.

For x86_64, it seems the vectorizer doesn't seem to try V4HI mode.
If I "force" the vectorizer to use V4HI mode, we get the following dump:
foo.c:9:21: note:   === vect_analyze_loop_operations ===
foo.c:9:21: note:   examining phi: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   vect_is_simple_use: operand (int) aval_13, type of
def: internal
foo.c:9:21: note:   vect_is_simple_use: vectype vector(2) int
foo.c:9:21: note:   vect_is_simple_use: operand last_19 = PHI
<last_8(7), 108(15)>, type of def: reduction
foo.c:9:21: note:   vect_is_simple_use: vectype vector(2) int
foo.c:9:21: missed:   multiple types in double reduction or condition
reduction or fold-left reduction.

Not sure tho if this is the only reason for the test to fail to
vectorize on the target.
Will investigate in more details next week.

Thanks,
Prathamesh
>
> Richard.

;; Function condition_reduction (condition_reduction, funcdef_no=0, 
decl_uid=4390, cgraph_uid=1, symbol_order=0)


Analyzing loop at foo.c:9
foo.c:9:21: note:  === analyze_loop_nest ===
foo.c:9:21: note:   === vect_analyze_loop_form ===
foo.c:9:21: note:    === get_loop_niters ===
Analyzing # of iterations of loop 1
  exit condition [42, + , 4294967295] != 0
  bounds on difference of bases: -42 ... -42
  result:
    # of iterations 42, bounded by 42
Creating dr for *_3
analyze_innermost: success.
        base_address: a_12(D)
        offset from base address: 0
        constant offset from base address: 0
        step: 2
        base alignment: 2
        base misalignment: 0
        offset alignment: 128
        step alignment: 2
        base_object: *a_12(D)
        Access function 0: {0B, +, 2}_1
Creating dr for *_6
analyze_innermost: success.
        base_address: b_14(D)
        offset from base address: 0
        constant offset from base address: 0
        step: 4
        base alignment: 4
        base misalignment: 0
        offset alignment: 128
        step alignment: 4
        base_object: *b_14(D)
        Access function 0: {0B, +, 4}_1
foo.c:9:21: note:   === vect_analyze_data_refs ===
foo.c:9:21: note:   got vectype for stmt: aval_13 = *_3;
vector(8) short int
foo.c:9:21: note:   got vectype for stmt: _7 = *_6;
vector(4) int
foo.c:9:21: note:   === vect_analyze_scalar_cycles ===
foo.c:9:21: note:   Analyze phi: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   Access function of PHI: last_19
foo.c:9:21: note:   Analyze phi: i_21 = PHI <i_17(7), 0(15)>
foo.c:9:21: note:   Access function of PHI: {0, +, 1}_1
foo.c:9:21: note:   step: 1,  init: 0
foo.c:9:21: note:   Detected induction.
foo.c:9:21: note:   Analyze phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)>
foo.c:9:21: note:   Access function of PHI: {43, +, 4294967295}_1
foo.c:9:21: note:   step: 4294967295,  init: 43
foo.c:9:21: note:   Detected induction.
foo.c:9:21: note:   Analyze phi: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   reduction path: last_8 last_19 
foo.c:9:21: note:   reduction: detected reduction
foo.c:9:21: note:   Detected reduction.
foo.c:9:21: note:   === vect_determine_precisions ===
foo.c:9:21: note:   using boolean precision 32 for _9 = _7 < min_v_15(D);
foo.c:9:21: note:   ivtmp_10 has no range info
foo.c:9:21: note:   i_17 has range [0x1, 0x2b]
foo.c:9:21: note:   can narrow to unsigned:6 without loss of precision: i_17 = 
i_21 + 1;
foo.c:9:21: note:   last_8 has no range info
foo.c:9:21: note:   last_16 has no range info
foo.c:9:21: note:   _7 has no range info
foo.c:9:21: note:   _5 has range [0x0, 0xa8]
foo.c:9:21: note:   can narrow to unsigned:8 without loss of precision: _5 = _1 
* 4;
foo.c:9:21: note:   aval_13 has no range info
foo.c:9:21: note:   _2 has range [0x0, 0x54]
foo.c:9:21: note:   can narrow to unsigned:7 without loss of precision: _2 = _1 
* 2;
foo.c:9:21: note:   _1 has range [0x0, 0x2a]
foo.c:9:21: note:   === vect_pattern_recog ===
foo.c:9:21: note:   vect_is_simple_use: operand (long unsigned int) i_21, type 
of def: internal
foo.c:9:21: note:   vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, 
type of def: induction
foo.c:9:21: note:   vect_is_simple_use: operand (long unsigned int) i_21, type 
of def: internal
foo.c:9:21: note:   vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, 
type of def: induction
foo.c:9:21: note:   vect_recog_widen_mult_pattern: detected: _2 = _1 * 2;
foo.c:9:21: note:   widen_mult pattern recognized: patt_37 = (long unsigned 
int) patt_4;
foo.c:9:21: note:   extra pattern stmt: patt_4 = i_21 w* 2;
foo.c:9:21: note:   vect_is_simple_use: operand (long unsigned int) i_21, type 
of def: internal
foo.c:9:21: note:   vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, 
type of def: induction
foo.c:9:21: note:   vect_is_simple_use: operand (long unsigned int) i_21, type 
of def: internal
foo.c:9:21: note:   vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, 
type of def: induction
foo.c:9:21: note:   vect_recog_widen_mult_pattern: detected: _5 = _1 * 4;
foo.c:9:21: note:   widen_mult pattern recognized: patt_39 = (long unsigned 
int) patt_38;
foo.c:9:21: note:   extra pattern stmt: patt_38 = i_21 w* 4;
foo.c:9:21: note:   vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, 
type of def: induction
foo.c:9:21: note:   vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, 
type of def: induction
foo.c:9:21: note:   vect_is_simple_use: operand ivtmp_18 = PHI <ivtmp_10(7), 
43(15)>, type of def: induction
foo.c:9:21: note:   === vect_analyze_data_ref_accesses ===
foo.c:9:21: note:   === vect_mark_stmts_to_be_vectorized ===
foo.c:9:21: note:   init: phi relevant? last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   init: phi relevant? i_21 = PHI <i_17(7), 0(15)>
foo.c:9:21: note:   init: phi relevant? ivtmp_18 = PHI <ivtmp_10(7), 43(15)>
foo.c:9:21: note:   init: stmt relevant? _1 = (long unsigned int) i_21;
foo.c:9:21: note:   init: stmt relevant? _2 = _1 * 2;
foo.c:9:21: note:   init: stmt relevant? _3 = a_12(D) + _2;
foo.c:9:21: note:   init: stmt relevant? aval_13 = *_3;
foo.c:9:21: note:   init: stmt relevant? _5 = _1 * 4;
foo.c:9:21: note:   init: stmt relevant? _6 = b_14(D) + _5;
foo.c:9:21: note:   init: stmt relevant? _7 = *_6;
foo.c:9:21: note:   init: stmt relevant? last_16 = (int) aval_13;
foo.c:9:21: note:   init: stmt relevant? _9 = _7 < min_v_15(D);
foo.c:9:21: note:   init: stmt relevant? last_8 = _9 ? last_16 : last_19;
foo.c:9:21: note:   vec_stmt_relevant_p: used out of loop.
foo.c:9:21: note:   vect_is_simple_use: operand _7 < min_v_15(D), type of def: 
internal
foo.c:9:21: note:   vec_stmt_relevant_p: stmt live but not relevant.
foo.c:9:21: note:   mark relevant 1, live 1: last_8 = _9 ? last_16 : last_19;
foo.c:9:21: note:   init: stmt relevant? i_17 = i_21 + 1;
foo.c:9:21: note:   init: stmt relevant? ivtmp_10 = ivtmp_18 - 1;
foo.c:9:21: note:   init: stmt relevant? if (ivtmp_10 != 0)
foo.c:9:21: note:   worklist: examine stmt: last_8 = _9 ? last_16 : last_19;
foo.c:9:21: note:   vect_is_simple_use: operand _7 < min_v_15(D), type of def: 
internal
foo.c:9:21: note:   mark relevant 1, live 0: _9 = _7 < min_v_15(D);
foo.c:9:21: note:   vect_is_simple_use: operand (int) aval_13, type of def: 
internal
foo.c:9:21: note:   mark relevant 1, live 0: last_16 = (int) aval_13;
foo.c:9:21: note:   vect_is_simple_use: operand last_19 = PHI <last_8(7), 
108(15)>, type of def: reduction
foo.c:9:21: note:   mark relevant 1, live 0: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   worklist: examine stmt: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   vect_is_simple_use: operand _9 ? last_16 : last_19, type of 
def: reduction
foo.c:9:21: note:   reduc-stmt defining reduc-phi in the same nest.
foo.c:9:21: note:   mark relevant 1, live 1: last_8 = _9 ? last_16 : last_19;
foo.c:9:21: note:   already marked relevant/live.
foo.c:9:21: note:   vect_is_simple_use: operand 108, type of def: constant
foo.c:9:21: note:   worklist: examine stmt: last_16 = (int) aval_13;
foo.c:9:21: note:   vect_is_simple_use: operand *_3, type of def: internal
foo.c:9:21: note:   mark relevant 1, live 0: aval_13 = *_3;
foo.c:9:21: note:   worklist: examine stmt: aval_13 = *_3;
foo.c:9:21: note:   worklist: examine stmt: _9 = _7 < min_v_15(D);
foo.c:9:21: note:   vect_is_simple_use: operand *_6, type of def: internal
foo.c:9:21: note:   mark relevant 1, live 0: _7 = *_6;
foo.c:9:21: note:   vect_is_simple_use: operand min_v_15(D), type of def: 
external
foo.c:9:21: note:   worklist: examine stmt: _7 = *_6;
foo.c:9:21: note:   === vect_analyze_data_ref_dependences ===
foo.c:9:21: note:   === vect_determine_vectorization_factor ===
foo.c:9:21: note:   ==> examining phi: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   get vectype for scalar type:  int
foo.c:9:21: note:   vectype: vector(4) int
foo.c:9:21: note:   nunits = 4
foo.c:9:21: note:   ==> examining phi: i_21 = PHI <i_17(7), 0(15)>
foo.c:9:21: note:   ==> examining phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)>
foo.c:9:21: note:   ==> examining statement: _1 = (long unsigned int) i_21;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: _2 = _1 * 2;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining pattern def stmt: patt_4 = i_21 w* 2;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining pattern statement: patt_37 = (long unsigned 
int) patt_4;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: _3 = a_12(D) + _2;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: aval_13 = *_3;
foo.c:9:21: note:   precomputed vectype: vector(8) short int
foo.c:9:21: note:   nunits = 8
foo.c:9:21: note:   ==> examining statement: _5 = _1 * 4;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining pattern def stmt: patt_38 = i_21 w* 4;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining pattern statement: patt_39 = (long unsigned 
int) patt_38;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: _6 = b_14(D) + _5;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: _7 = *_6;
foo.c:9:21: note:   precomputed vectype: vector(4) int
foo.c:9:21: note:   nunits = 4
foo.c:9:21: note:   ==> examining statement: last_16 = (int) aval_13;
foo.c:9:21: note:   get vectype for scalar type: int
foo.c:9:21: note:   vectype: vector(4) int
foo.c:9:21: note:   get vectype for smallest scalar type: short int
foo.c:9:21: note:   nunits vectype: vector(8) short int
foo.c:9:21: note:   nunits = 8
foo.c:9:21: note:   ==> examining statement: _9 = _7 < min_v_15(D);
foo.c:9:21: note:   vectype: vector(4) <signed-boolean:32>
foo.c:9:21: note:   nunits = 4
foo.c:9:21: note:   ==> examining statement: last_8 = _9 ? last_16 : last_19;
foo.c:9:21: note:   get vectype for scalar type: int
foo.c:9:21: note:   vectype: vector(4) int
foo.c:9:21: note:   nunits = 4
foo.c:9:21: note:   ==> examining statement: i_17 = i_21 + 1;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: ivtmp_10 = ivtmp_18 - 1;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: if (ivtmp_10 != 0)
foo.c:9:21: note:   skip.
foo.c:9:21: note:   vectorization factor = 8
foo.c:9:21: note:   === vect_compute_single_scalar_iteration_cost ===
*_3 1 times scalar_load costs 1 in prologue
*_6 1 times scalar_load costs 1 in prologue
(int) aval_13 1 times scalar_stmt costs 1 in prologue
_7 < min_v_15(D) 1 times scalar_stmt costs 1 in prologue
_9 ? last_16 : last_19 1 times scalar_stmt costs 1 in prologue
foo.c:9:21: note:   === vect_analyze_slp ===
foo.c:9:21: note:   === vect_make_slp_decision ===
foo.c:9:21: note:  vectorization_factor = 8, niters = 43
foo.c:9:21: note:   === vect_analyze_data_refs_alignment ===
foo.c:9:21: note:   recording new base alignment for a_12(D)
  alignment:    2
  misalignment: 0
  based on:     aval_13 = *_3;
foo.c:9:21: note:   recording new base alignment for b_14(D)
  alignment:    4
  misalignment: 0
  based on:     _7 = *_6;
foo.c:9:21: note:   vect_compute_data_ref_alignment:
foo.c:9:21: note:   can't force alignment of ref: *_3
foo.c:9:21: note:   vect_compute_data_ref_alignment:
foo.c:9:21: note:   can't force alignment of ref: *_6
foo.c:9:21: note:   === vect_prune_runtime_alias_test_list ===
foo.c:9:21: note:   === vect_enhance_data_refs_alignment ===
foo.c:9:21: missed:   Unknown misalignment, naturally aligned
foo.c:9:21: missed:   Unknown misalignment, naturally aligned
foo.c:9:21: note:   vect_can_advance_ivs_p:
foo.c:9:21: note:   Analyze phi: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   reduc or virtual phi. skip.
foo.c:9:21: note:   Analyze phi: i_21 = PHI <i_17(7), 0(15)>
foo.c:9:21: note:   Analyze phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)>
foo.c:9:21: note:   vect_model_load_cost: aligned.
foo.c:9:21: note:   vect_get_data_access_cost: inside_cost = 1, outside_cost = 
0.
foo.c:9:21: note:   vect_model_load_cost: unaligned supported by hardware.
foo.c:9:21: note:   vect_get_data_access_cost: inside_cost = 3, outside_cost = 
0.
foo.c:9:21: note:   vect_model_load_cost: unaligned supported by hardware.
foo.c:9:21: note:   vect_get_data_access_cost: inside_cost = 1, outside_cost = 
0.
foo.c:9:21: note:   vect_model_load_cost: unaligned supported by hardware.
foo.c:9:21: note:   vect_get_data_access_cost: inside_cost = 3, outside_cost = 
0.
foo.c:9:21: note:   === vect_dissolve_slp_only_groups ===
foo.c:9:21: note:   === vect_analyze_loop_operations ===
foo.c:9:21: note:   examining phi: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   vect_is_simple_use: operand (int) aval_13, type of def: 
internal
foo.c:9:21: note:   vect_is_simple_use: vectype vector(4) int
foo.c:9:21: note:   vect_is_simple_use: operand last_19 = PHI <last_8(7), 
108(15)>, type of def: reduction
foo.c:9:21: note:   vect_is_simple_use: vectype vector(4) int
foo.c:9:21: missed:   multiple types in double reduction or condition reduction 
or fold-left reduction.
foo.c:4:1: missed:   not vectorized: relevant phi not supported: last_19 = PHI 
<last_8(7), 108(15)>
foo.c:9:21: missed:  bad operation or unsupported loop bound.
foo.c:9:21: note:  ***** Analysis  failed with vector mode V8HI
foo.c:9:21: note:  ***** The result for vector mode V16QI would be the same
foo.c:9:21: note:  ***** The result for vector mode V8QI would be the same
foo.c:9:21: note:  ***** Re-trying analysis with vector mode V4HI
foo.c:9:21: note:   === vect_analyze_data_refs ===
foo.c:9:21: note:   got vectype for stmt: aval_13 = *_3;
vector(4) short int
foo.c:9:21: note:   got vectype for stmt: _7 = *_6;
vector(4) int
foo.c:9:21: note:   === vect_analyze_scalar_cycles ===
foo.c:9:21: note:   Analyze phi: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   Access function of PHI: last_19
foo.c:9:21: note:   Analyze phi: i_21 = PHI <i_17(7), 0(15)>
foo.c:9:21: note:   Access function of PHI: {0, +, 1}_1
foo.c:9:21: note:   step: 1,  init: 0
foo.c:9:21: note:   Detected induction.
foo.c:9:21: note:   Analyze phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)>
foo.c:9:21: note:   Access function of PHI: {43, +, 4294967295}_1
foo.c:9:21: note:   step: 4294967295,  init: 43
foo.c:9:21: note:   Detected induction.
foo.c:9:21: note:   Analyze phi: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   reduction path: last_8 last_19 
foo.c:9:21: note:   reduction: detected reduction
foo.c:9:21: note:   Detected reduction.
foo.c:9:21: note:   === vect_determine_precisions ===
foo.c:9:21: note:   using boolean precision 32 for _9 = _7 < min_v_15(D);
foo.c:9:21: note:   ivtmp_10 has no range info
foo.c:9:21: note:   i_17 has range [0x1, 0x2b]
foo.c:9:21: note:   can narrow to unsigned:6 without loss of precision: i_17 = 
i_21 + 1;
foo.c:9:21: note:   last_8 has no range info
foo.c:9:21: note:   last_16 has no range info
foo.c:9:21: note:   _7 has no range info
foo.c:9:21: note:   _5 has range [0x0, 0xa8]
foo.c:9:21: note:   can narrow to unsigned:8 without loss of precision: _5 = _1 
* 4;
foo.c:9:21: note:   aval_13 has no range info
foo.c:9:21: note:   _2 has range [0x0, 0x54]
foo.c:9:21: note:   can narrow to unsigned:7 without loss of precision: _2 = _1 
* 2;
foo.c:9:21: note:   _1 has range [0x0, 0x2a]
foo.c:9:21: note:   === vect_pattern_recog ===
foo.c:9:21: note:   vect_is_simple_use: operand (long unsigned int) i_21, type 
of def: internal
foo.c:9:21: note:   vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, 
type of def: induction
foo.c:9:21: note:   vect_is_simple_use: operand (long unsigned int) i_21, type 
of def: internal
foo.c:9:21: note:   vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, 
type of def: induction
foo.c:9:21: note:   vect_recog_widen_mult_pattern: detected: _2 = _1 * 2;
foo.c:9:21: note:   widen_mult pattern recognized: patt_41 = (long unsigned 
int) patt_40;
foo.c:9:21: note:   extra pattern stmt: patt_40 = i_21 w* 2;
foo.c:9:21: note:   vect_is_simple_use: operand (long unsigned int) i_21, type 
of def: internal
foo.c:9:21: note:   vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, 
type of def: induction
foo.c:9:21: note:   vect_is_simple_use: operand (long unsigned int) i_21, type 
of def: internal
foo.c:9:21: note:   vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, 
type of def: induction
foo.c:9:21: note:   vect_recog_widen_mult_pattern: detected: _5 = _1 * 4;
foo.c:9:21: note:   widen_mult pattern recognized: patt_43 = (long unsigned 
int) patt_42;
foo.c:9:21: note:   extra pattern stmt: patt_42 = i_21 w* 4;
foo.c:9:21: note:   vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, 
type of def: induction
foo.c:9:21: note:   vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, 
type of def: induction
foo.c:9:21: note:   vect_is_simple_use: operand ivtmp_18 = PHI <ivtmp_10(7), 
43(15)>, type of def: induction
foo.c:9:21: note:   === vect_analyze_data_ref_accesses ===
foo.c:9:21: note:   === vect_mark_stmts_to_be_vectorized ===
foo.c:9:21: note:   init: phi relevant? last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   init: phi relevant? i_21 = PHI <i_17(7), 0(15)>
foo.c:9:21: note:   init: phi relevant? ivtmp_18 = PHI <ivtmp_10(7), 43(15)>
foo.c:9:21: note:   init: stmt relevant? _1 = (long unsigned int) i_21;
foo.c:9:21: note:   init: stmt relevant? _2 = _1 * 2;
foo.c:9:21: note:   init: stmt relevant? _3 = a_12(D) + _2;
foo.c:9:21: note:   init: stmt relevant? aval_13 = *_3;
foo.c:9:21: note:   init: stmt relevant? _5 = _1 * 4;
foo.c:9:21: note:   init: stmt relevant? _6 = b_14(D) + _5;
foo.c:9:21: note:   init: stmt relevant? _7 = *_6;
foo.c:9:21: note:   init: stmt relevant? last_16 = (int) aval_13;
foo.c:9:21: note:   init: stmt relevant? _9 = _7 < min_v_15(D);
foo.c:9:21: note:   init: stmt relevant? last_8 = _9 ? last_16 : last_19;
foo.c:9:21: note:   vec_stmt_relevant_p: used out of loop.
foo.c:9:21: note:   vect_is_simple_use: operand _7 < min_v_15(D), type of def: 
internal
foo.c:9:21: note:   vec_stmt_relevant_p: stmt live but not relevant.
foo.c:9:21: note:   mark relevant 1, live 1: last_8 = _9 ? last_16 : last_19;
foo.c:9:21: note:   init: stmt relevant? i_17 = i_21 + 1;
foo.c:9:21: note:   init: stmt relevant? ivtmp_10 = ivtmp_18 - 1;
foo.c:9:21: note:   init: stmt relevant? if (ivtmp_10 != 0)
foo.c:9:21: note:   worklist: examine stmt: last_8 = _9 ? last_16 : last_19;
foo.c:9:21: note:   vect_is_simple_use: operand _7 < min_v_15(D), type of def: 
internal
foo.c:9:21: note:   mark relevant 1, live 0: _9 = _7 < min_v_15(D);
foo.c:9:21: note:   vect_is_simple_use: operand (int) aval_13, type of def: 
internal
foo.c:9:21: note:   mark relevant 1, live 0: last_16 = (int) aval_13;
foo.c:9:21: note:   vect_is_simple_use: operand last_19 = PHI <last_8(7), 
108(15)>, type of def: reduction
foo.c:9:21: note:   mark relevant 1, live 0: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   worklist: examine stmt: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   vect_is_simple_use: operand _9 ? last_16 : last_19, type of 
def: reduction
foo.c:9:21: note:   reduc-stmt defining reduc-phi in the same nest.
foo.c:9:21: note:   mark relevant 1, live 1: last_8 = _9 ? last_16 : last_19;
foo.c:9:21: note:   already marked relevant/live.
foo.c:9:21: note:   vect_is_simple_use: operand 108, type of def: constant
foo.c:9:21: note:   worklist: examine stmt: last_16 = (int) aval_13;
foo.c:9:21: note:   vect_is_simple_use: operand *_3, type of def: internal
foo.c:9:21: note:   mark relevant 1, live 0: aval_13 = *_3;
foo.c:9:21: note:   worklist: examine stmt: aval_13 = *_3;
foo.c:9:21: note:   worklist: examine stmt: _9 = _7 < min_v_15(D);
foo.c:9:21: note:   vect_is_simple_use: operand *_6, type of def: internal
foo.c:9:21: note:   mark relevant 1, live 0: _7 = *_6;
foo.c:9:21: note:   vect_is_simple_use: operand min_v_15(D), type of def: 
external
foo.c:9:21: note:   worklist: examine stmt: _7 = *_6;
foo.c:9:21: note:   === vect_analyze_data_ref_dependences ===
foo.c:9:21: note:   === vect_determine_vectorization_factor ===
foo.c:9:21: note:   ==> examining phi: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   get vectype for scalar type:  int
foo.c:9:21: note:   vectype: vector(4) int
foo.c:9:21: note:   nunits = 4
foo.c:9:21: note:   ==> examining phi: i_21 = PHI <i_17(7), 0(15)>
foo.c:9:21: note:   ==> examining phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)>
foo.c:9:21: note:   ==> examining statement: _1 = (long unsigned int) i_21;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: _2 = _1 * 2;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining pattern def stmt: patt_40 = i_21 w* 2;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining pattern statement: patt_41 = (long unsigned 
int) patt_40;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: _3 = a_12(D) + _2;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: aval_13 = *_3;
foo.c:9:21: note:   precomputed vectype: vector(4) short int
foo.c:9:21: note:   nunits = 4
foo.c:9:21: note:   ==> examining statement: _5 = _1 * 4;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining pattern def stmt: patt_42 = i_21 w* 4;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining pattern statement: patt_43 = (long unsigned 
int) patt_42;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: _6 = b_14(D) + _5;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: _7 = *_6;
foo.c:9:21: note:   precomputed vectype: vector(4) int
foo.c:9:21: note:   nunits = 4
foo.c:9:21: note:   ==> examining statement: last_16 = (int) aval_13;
foo.c:9:21: note:   get vectype for scalar type: int
foo.c:9:21: note:   vectype: vector(4) int
foo.c:9:21: note:   get vectype for smallest scalar type: short int
foo.c:9:21: note:   nunits vectype: vector(4) short int
foo.c:9:21: note:   nunits = 4
foo.c:9:21: note:   ==> examining statement: _9 = _7 < min_v_15(D);
foo.c:9:21: note:   vectype: vector(4) <signed-boolean:32>
foo.c:9:21: note:   nunits = 4
foo.c:9:21: note:   ==> examining statement: last_8 = _9 ? last_16 : last_19;
foo.c:9:21: note:   get vectype for scalar type: int
foo.c:9:21: note:   vectype: vector(4) int
foo.c:9:21: note:   nunits = 4
foo.c:9:21: note:   ==> examining statement: i_17 = i_21 + 1;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: ivtmp_10 = ivtmp_18 - 1;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: if (ivtmp_10 != 0)
foo.c:9:21: note:   skip.
foo.c:9:21: note:   vectorization factor = 4
foo.c:9:21: note:   === vect_compute_single_scalar_iteration_cost ===
*_3 1 times scalar_load costs 1 in prologue
*_6 1 times scalar_load costs 1 in prologue
(int) aval_13 1 times scalar_stmt costs 1 in prologue
_7 < min_v_15(D) 1 times scalar_stmt costs 1 in prologue
_9 ? last_16 : last_19 1 times scalar_stmt costs 1 in prologue
foo.c:9:21: note:   === vect_analyze_slp ===
foo.c:9:21: note:   === vect_make_slp_decision ===
foo.c:9:21: note:  vectorization_factor = 4, niters = 43
foo.c:9:21: note:   === vect_analyze_data_refs_alignment ===
foo.c:9:21: note:   recording new base alignment for a_12(D)
  alignment:    2
  misalignment: 0
  based on:     aval_13 = *_3;
foo.c:9:21: note:   recording new base alignment for b_14(D)
  alignment:    4
  misalignment: 0
  based on:     _7 = *_6;
foo.c:9:21: note:   vect_compute_data_ref_alignment:
foo.c:9:21: note:   can't force alignment of ref: *_3
foo.c:9:21: note:   vect_compute_data_ref_alignment:
foo.c:9:21: note:   can't force alignment of ref: *_6
foo.c:9:21: note:   === vect_prune_runtime_alias_test_list ===
foo.c:9:21: note:   === vect_enhance_data_refs_alignment ===
foo.c:9:21: missed:   Unknown misalignment, naturally aligned
foo.c:9:21: missed:   Unknown misalignment, naturally aligned
foo.c:9:21: note:   vect_can_advance_ivs_p:
foo.c:9:21: note:   Analyze phi: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   reduc or virtual phi. skip.
foo.c:9:21: note:   Analyze phi: i_21 = PHI <i_17(7), 0(15)>
foo.c:9:21: note:   Analyze phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)>
foo.c:9:21: note:   vect_model_load_cost: aligned.
foo.c:9:21: note:   vect_get_data_access_cost: inside_cost = 1, outside_cost = 
0.
foo.c:9:21: note:   vect_model_load_cost: unaligned supported by hardware.
foo.c:9:21: note:   vect_get_data_access_cost: inside_cost = 2, outside_cost = 
0.
foo.c:9:21: note:   vect_model_load_cost: unaligned supported by hardware.
foo.c:9:21: note:   vect_get_data_access_cost: inside_cost = 1, outside_cost = 
0.
foo.c:9:21: note:   vect_model_load_cost: unaligned supported by hardware.
foo.c:9:21: note:   vect_get_data_access_cost: inside_cost = 2, outside_cost = 
0.
foo.c:9:21: note:   === vect_dissolve_slp_only_groups ===
foo.c:9:21: note:   === vect_analyze_loop_operations ===
foo.c:9:21: note:   examining phi: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   vect_is_simple_use: operand (int) aval_13, type of def: 
internal
foo.c:9:21: note:   vect_is_simple_use: vectype vector(4) int
foo.c:9:21: note:   vect_is_simple_use: operand last_19 = PHI <last_8(7), 
108(15)>, type of def: reduction
foo.c:9:21: note:   vect_is_simple_use: vectype vector(4) int
Estimating # of iterations of loop 1
Analyzing # of iterations of loop 1
  exit condition [42, + , 4294967295] != 0
  bounds on difference of bases: -42 ... -42
  result:
    # of iterations 42, bounded by 42
Analyzing # of iterations of loop 1
  exit condition [42, + , 4294967295] != 0
  bounds on difference of bases: -42 ... -42
  result:
    # of iterations 42, bounded by 42
Statement (exit)if (ivtmp_10 != 0)
 is executed at most 42 (bounded by 42) + 1 times in loop 1.
Induction variable (short int *) a_12(D) + 2 * iteration does not wrap in 
statement _3 = a_12(D) + _2;
 in loop 1.
Statement _3 = a_12(D) + _2;
 is executed at most 9223372036854775806 (bounded by 9223372036854775806) + 1 
times in loop 1.
Induction variable (int *) b_14(D) + 4 * iteration does not wrap in statement 
_6 = b_14(D) + _5;
 in loop 1.
Statement _6 = b_14(D) + _5;
 is executed at most 4611686018427387902 (bounded by 4611686018427387902) + 1 
times in loop 1.
Induction variable (int) 1 + 1 * iteration does not wrap in statement i_17 = 
i_21 + 1;
 in loop 1.
Statement i_17 = i_21 + 1;
 is executed at most 42 (bounded by 42) + 1 times in loop 1.
vect_model_reduction_cost: inside_cost = 0, prologue_cost = 4, epilogue_cost = 
7 .
foo.c:9:21: note:   examining phi: i_21 = PHI <i_17(7), 0(15)>
foo.c:9:21: note:   examining phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)>
foo.c:9:21: note:   ==> examining statement: _1 = (long unsigned int) i_21;
foo.c:9:21: note:   irrelevant.
foo.c:9:21: note:   ==> examining statement: _2 = _1 * 2;
foo.c:9:21: note:   irrelevant.
foo.c:9:21: note:   ==> examining statement: _3 = a_12(D) + _2;
foo.c:9:21: note:   irrelevant.
foo.c:9:21: note:   ==> examining statement: aval_13 = *_3;
foo.c:9:21: missed:   can't operate on partial vectors because the target 
doesn't have the appropriate partial vectorization load or store.
foo.c:9:21: note:   Vectorizing an unaligned access.
foo.c:9:21: note:   vect_model_load_cost: unaligned supported by hardware.
foo.c:9:21: note:   vect_model_load_cost: inside_cost = 1, prologue_cost = 0 .
foo.c:9:21: note:   ==> examining statement: _5 = _1 * 4;
foo.c:9:21: note:   irrelevant.
foo.c:9:21: note:   ==> examining statement: _6 = b_14(D) + _5;
foo.c:9:21: note:   irrelevant.
foo.c:9:21: note:   ==> examining statement: _7 = *_6;
foo.c:9:21: note:   Vectorizing an unaligned access.
foo.c:9:21: note:   vect_model_load_cost: unaligned supported by hardware.
foo.c:9:21: note:   vect_model_load_cost: inside_cost = 1, prologue_cost = 0 .
foo.c:9:21: note:   ==> examining statement: last_16 = (int) aval_13;
foo.c:9:21: note:   vect_is_simple_use: operand *_3, type of def: internal
foo.c:9:21: note:   vect_is_simple_use: vectype vector(4) short int
foo.c:9:21: note:    === vectorizable_conversion ===
foo.c:9:21: note:    vect_model_simple_cost: inside_cost = 1, prologue_cost = 0 
.
foo.c:9:21: note:   ==> examining statement: _9 = _7 < min_v_15(D);
foo.c:9:21: note:   vect_is_simple_use: operand *_6, type of def: internal
foo.c:9:21: note:   vect_is_simple_use: vectype vector(4) int
foo.c:9:21: note:   vect_is_simple_use: operand min_v_15(D), type of def: 
external
foo.c:9:21: note:   vect_model_simple_cost: inside_cost = 1, prologue_cost = 1 .
foo.c:9:21: note:   ==> examining statement: last_8 = _9 ? last_16 : last_19;
foo.c:9:21: note:   vect_is_simple_use: operand _7 < min_v_15(D), type of def: 
internal
foo.c:9:21: note:   vect_is_simple_use: vectype vector(4) <signed-boolean:32>
foo.c:9:21: note:   vect_is_simple_use: operand (int) aval_13, type of def: 
internal
foo.c:9:21: note:   vect_is_simple_use: vectype vector(4) int
foo.c:9:21: note:   vect_is_simple_use: operand last_19 = PHI <last_8(7), 
108(15)>, type of def: reduction
foo.c:9:21: note:   vect_is_simple_use: vectype vector(4) int
foo.c:9:21: note:   vect_model_simple_cost: inside_cost = 1, prologue_cost = 0 .
foo.c:9:21: note:   ==> examining statement: i_17 = i_21 + 1;
foo.c:9:21: note:   irrelevant.
foo.c:9:21: note:   ==> examining statement: ivtmp_10 = ivtmp_18 - 1;
foo.c:9:21: note:   irrelevant.
foo.c:9:21: note:   ==> examining statement: if (ivtmp_10 != 0)
foo.c:9:21: note:   irrelevant.
_9 ? last_16 : last_19 4 times scalar_to_vec costs 4 in prologue
_9 ? last_16 : last_19 2 times vector_stmt costs 2 in epilogue
_9 ? last_16 : last_19 2 times vec_to_scalar costs 4 in epilogue
_9 ? last_16 : last_19 1 times scalar_to_vec costs 1 in epilogue
*_3 1 times unaligned_load (misalign -1) costs 1 in body
*_6 1 times unaligned_load (misalign -1) costs 1 in body
(int) aval_13 1 times vector_stmt costs 1 in body
_7 < min_v_15(D) 1 times scalar_to_vec costs 1 in prologue
_7 < min_v_15(D) 1 times vector_stmt costs 1 in body
_9 ? last_16 : last_19 1 times vector_stmt costs 1 in body
foo.c:9:21: note:  operating on full vectors.
foo.c:9:21: note:  cost model disabled.
foo.c:9:21: note:  epilog loop required
foo.c:9:21: note:  vect_can_advance_ivs_p:
foo.c:9:21: note:  Analyze phi: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:  reduc or virtual phi. skip.
foo.c:9:21: note:  Analyze phi: i_21 = PHI <i_17(7), 0(15)>
foo.c:9:21: note:  Analyze phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)>
foo.c:9:21: note:  ***** Analysis succeeded with vector mode V4HI
foo.c:9:21: note:  ***** Choosing vector mode V4HI
foo.c:9:21: note:  ***** Re-trying epilogue analysis with vector mode V16QI
foo.c:9:21: note:   === vect_analyze_data_refs ===
foo.c:9:21: note:   got vectype for stmt: aval_13 = *_3;
vector(8) short int
foo.c:9:21: note:   got vectype for stmt: _7 = *_6;
vector(4) int
foo.c:9:21: note:   === vect_analyze_scalar_cycles ===
foo.c:9:21: note:   Analyze phi: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   Access function of PHI: last_19
foo.c:9:21: note:   Analyze phi: i_21 = PHI <i_17(7), 0(15)>
foo.c:9:21: note:   Access function of PHI: {0, +, 1}_1
foo.c:9:21: note:   step: 1,  init: 0
foo.c:9:21: note:   Detected induction.
foo.c:9:21: note:   Analyze phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)>
foo.c:9:21: note:   Access function of PHI: {43, +, 4294967295}_1
foo.c:9:21: note:   step: 4294967295,  init: 43
foo.c:9:21: note:   Detected induction.
foo.c:9:21: note:   Analyze phi: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   reduction path: last_8 last_19 
foo.c:9:21: note:   reduction: detected reduction
foo.c:9:21: note:   Detected reduction.
foo.c:9:21: note:   === vect_determine_precisions ===
foo.c:9:21: note:   using boolean precision 32 for _9 = _7 < min_v_15(D);
foo.c:9:21: note:   ivtmp_10 has no range info
foo.c:9:21: note:   i_17 has range [0x1, 0x2b]
foo.c:9:21: note:   can narrow to unsigned:6 without loss of precision: i_17 = 
i_21 + 1;
foo.c:9:21: note:   last_8 has no range info
foo.c:9:21: note:   last_16 has no range info
foo.c:9:21: note:   _7 has no range info
foo.c:9:21: note:   _5 has range [0x0, 0xa8]
foo.c:9:21: note:   can narrow to unsigned:8 without loss of precision: _5 = _1 
* 4;
foo.c:9:21: note:   aval_13 has no range info
foo.c:9:21: note:   _2 has range [0x0, 0x54]
foo.c:9:21: note:   can narrow to unsigned:7 without loss of precision: _2 = _1 
* 2;
foo.c:9:21: note:   _1 has range [0x0, 0x2a]
foo.c:9:21: note:   === vect_pattern_recog ===
foo.c:9:21: note:   vect_is_simple_use: operand (long unsigned int) i_21, type 
of def: internal
foo.c:9:21: note:   vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, 
type of def: induction
foo.c:9:21: note:   vect_is_simple_use: operand (long unsigned int) i_21, type 
of def: internal
foo.c:9:21: note:   vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, 
type of def: induction
foo.c:9:21: note:   vect_recog_widen_mult_pattern: detected: _2 = _1 * 2;
foo.c:9:21: note:   widen_mult pattern recognized: patt_45 = (long unsigned 
int) patt_44;
foo.c:9:21: note:   extra pattern stmt: patt_44 = i_21 w* 2;
foo.c:9:21: note:   vect_is_simple_use: operand (long unsigned int) i_21, type 
of def: internal
foo.c:9:21: note:   vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, 
type of def: induction
foo.c:9:21: note:   vect_is_simple_use: operand (long unsigned int) i_21, type 
of def: internal
foo.c:9:21: note:   vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, 
type of def: induction
foo.c:9:21: note:   vect_recog_widen_mult_pattern: detected: _5 = _1 * 4;
foo.c:9:21: note:   widen_mult pattern recognized: patt_47 = (long unsigned 
int) patt_46;
foo.c:9:21: note:   extra pattern stmt: patt_46 = i_21 w* 4;
foo.c:9:21: note:   vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, 
type of def: induction
foo.c:9:21: note:   vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, 
type of def: induction
foo.c:9:21: note:   vect_is_simple_use: operand ivtmp_18 = PHI <ivtmp_10(7), 
43(15)>, type of def: induction
foo.c:9:21: note:   === vect_analyze_data_ref_accesses ===
foo.c:9:21: note:   === vect_mark_stmts_to_be_vectorized ===
foo.c:9:21: note:   init: phi relevant? last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   init: phi relevant? i_21 = PHI <i_17(7), 0(15)>
foo.c:9:21: note:   init: phi relevant? ivtmp_18 = PHI <ivtmp_10(7), 43(15)>
foo.c:9:21: note:   init: stmt relevant? _1 = (long unsigned int) i_21;
foo.c:9:21: note:   init: stmt relevant? _2 = _1 * 2;
foo.c:9:21: note:   init: stmt relevant? _3 = a_12(D) + _2;
foo.c:9:21: note:   init: stmt relevant? aval_13 = *_3;
foo.c:9:21: note:   init: stmt relevant? _5 = _1 * 4;
foo.c:9:21: note:   init: stmt relevant? _6 = b_14(D) + _5;
foo.c:9:21: note:   init: stmt relevant? _7 = *_6;
foo.c:9:21: note:   init: stmt relevant? last_16 = (int) aval_13;
foo.c:9:21: note:   init: stmt relevant? _9 = _7 < min_v_15(D);
foo.c:9:21: note:   init: stmt relevant? last_8 = _9 ? last_16 : last_19;
foo.c:9:21: note:   vec_stmt_relevant_p: used out of loop.
foo.c:9:21: note:   vect_is_simple_use: operand _7 < min_v_15(D), type of def: 
internal
foo.c:9:21: note:   vec_stmt_relevant_p: stmt live but not relevant.
foo.c:9:21: note:   mark relevant 1, live 1: last_8 = _9 ? last_16 : last_19;
foo.c:9:21: note:   init: stmt relevant? i_17 = i_21 + 1;
foo.c:9:21: note:   init: stmt relevant? ivtmp_10 = ivtmp_18 - 1;
foo.c:9:21: note:   init: stmt relevant? if (ivtmp_10 != 0)
foo.c:9:21: note:   worklist: examine stmt: last_8 = _9 ? last_16 : last_19;
foo.c:9:21: note:   vect_is_simple_use: operand _7 < min_v_15(D), type of def: 
internal
foo.c:9:21: note:   mark relevant 1, live 0: _9 = _7 < min_v_15(D);
foo.c:9:21: note:   vect_is_simple_use: operand (int) aval_13, type of def: 
internal
foo.c:9:21: note:   mark relevant 1, live 0: last_16 = (int) aval_13;
foo.c:9:21: note:   vect_is_simple_use: operand last_19 = PHI <last_8(7), 
108(15)>, type of def: reduction
foo.c:9:21: note:   mark relevant 1, live 0: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   worklist: examine stmt: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   vect_is_simple_use: operand _9 ? last_16 : last_19, type of 
def: reduction
foo.c:9:21: note:   reduc-stmt defining reduc-phi in the same nest.
foo.c:9:21: note:   mark relevant 1, live 1: last_8 = _9 ? last_16 : last_19;
foo.c:9:21: note:   already marked relevant/live.
foo.c:9:21: note:   vect_is_simple_use: operand 108, type of def: constant
foo.c:9:21: note:   worklist: examine stmt: last_16 = (int) aval_13;
foo.c:9:21: note:   vect_is_simple_use: operand *_3, type of def: internal
foo.c:9:21: note:   mark relevant 1, live 0: aval_13 = *_3;
foo.c:9:21: note:   worklist: examine stmt: aval_13 = *_3;
foo.c:9:21: note:   worklist: examine stmt: _9 = _7 < min_v_15(D);
foo.c:9:21: note:   vect_is_simple_use: operand *_6, type of def: internal
foo.c:9:21: note:   mark relevant 1, live 0: _7 = *_6;
foo.c:9:21: note:   vect_is_simple_use: operand min_v_15(D), type of def: 
external
foo.c:9:21: note:   worklist: examine stmt: _7 = *_6;
foo.c:9:21: note:   === vect_analyze_data_ref_dependences ===
foo.c:9:21: note:   === vect_determine_vectorization_factor ===
foo.c:9:21: note:   ==> examining phi: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   get vectype for scalar type:  int
foo.c:9:21: note:   vectype: vector(4) int
foo.c:9:21: note:   nunits = 4
foo.c:9:21: note:   ==> examining phi: i_21 = PHI <i_17(7), 0(15)>
foo.c:9:21: note:   ==> examining phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)>
foo.c:9:21: note:   ==> examining statement: _1 = (long unsigned int) i_21;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: _2 = _1 * 2;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining pattern def stmt: patt_44 = i_21 w* 2;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining pattern statement: patt_45 = (long unsigned 
int) patt_44;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: _3 = a_12(D) + _2;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: aval_13 = *_3;
foo.c:9:21: note:   precomputed vectype: vector(8) short int
foo.c:9:21: note:   nunits = 8
foo.c:9:21: note:   ==> examining statement: _5 = _1 * 4;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining pattern def stmt: patt_46 = i_21 w* 4;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining pattern statement: patt_47 = (long unsigned 
int) patt_46;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: _6 = b_14(D) + _5;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: _7 = *_6;
foo.c:9:21: note:   precomputed vectype: vector(4) int
foo.c:9:21: note:   nunits = 4
foo.c:9:21: note:   ==> examining statement: last_16 = (int) aval_13;
foo.c:9:21: note:   get vectype for scalar type: int
foo.c:9:21: note:   vectype: vector(4) int
foo.c:9:21: note:   get vectype for smallest scalar type: short int
foo.c:9:21: note:   nunits vectype: vector(8) short int
foo.c:9:21: note:   nunits = 8
foo.c:9:21: note:   ==> examining statement: _9 = _7 < min_v_15(D);
foo.c:9:21: note:   vectype: vector(4) <signed-boolean:32>
foo.c:9:21: note:   nunits = 4
foo.c:9:21: note:   ==> examining statement: last_8 = _9 ? last_16 : last_19;
foo.c:9:21: note:   get vectype for scalar type: int
foo.c:9:21: note:   vectype: vector(4) int
foo.c:9:21: note:   nunits = 4
foo.c:9:21: note:   ==> examining statement: i_17 = i_21 + 1;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: ivtmp_10 = ivtmp_18 - 1;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: if (ivtmp_10 != 0)
foo.c:9:21: note:   skip.
foo.c:9:21: note:   vectorization factor = 8
foo.c:9:21: note:   === vect_compute_single_scalar_iteration_cost ===
*_3 1 times scalar_load costs 1 in prologue
*_6 1 times scalar_load costs 1 in prologue
(int) aval_13 1 times scalar_stmt costs 1 in prologue
_7 < min_v_15(D) 1 times scalar_stmt costs 1 in prologue
_9 ? last_16 : last_19 1 times scalar_stmt costs 1 in prologue
foo.c:9:21: note:   === vect_analyze_slp ===
foo.c:9:21: note:   === vect_make_slp_decision ===
foo.c:9:21: note:  vectorization_factor = 8, niters = 43
foo.c:9:21: note:   === vect_analyze_data_refs_alignment ===
foo.c:9:21: note:   recording new base alignment for a_12(D)
  alignment:    2
  misalignment: 0
  based on:     aval_13 = *_3;
foo.c:9:21: note:   recording new base alignment for b_14(D)
  alignment:    4
  misalignment: 0
  based on:     _7 = *_6;
foo.c:9:21: note:   vect_compute_data_ref_alignment:
foo.c:9:21: note:   can't force alignment of ref: *_3
foo.c:9:21: note:   vect_compute_data_ref_alignment:
foo.c:9:21: note:   can't force alignment of ref: *_6
foo.c:9:21: note:   === vect_prune_runtime_alias_test_list ===
foo.c:9:21: note:   === vect_dissolve_slp_only_groups ===
foo.c:9:21: note:   === vect_analyze_loop_operations ===
foo.c:9:21: note:   examining phi: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   vect_is_simple_use: operand (int) aval_13, type of def: 
internal
foo.c:9:21: note:   vect_is_simple_use: vectype vector(4) int
foo.c:9:21: note:   vect_is_simple_use: operand last_19 = PHI <last_8(7), 
108(15)>, type of def: reduction
foo.c:9:21: note:   vect_is_simple_use: vectype vector(4) int
foo.c:9:21: missed:   multiple types in double reduction or condition reduction 
or fold-left reduction.
foo.c:4:1: missed:   not vectorized: relevant phi not supported: last_19 = PHI 
<last_8(7), 108(15)>
foo.c:9:21: missed:  bad operation or unsupported loop bound.
foo.c:9:21: note:  ***** Analysis  failed with vector mode V16QI
foo.c:9:21: note:  ***** The result for vector mode V8QI would be the same
foo.c:9:21: note:  ***** Re-trying epilogue analysis with vector mode V2SI
foo.c:9:21: note:   === vect_analyze_data_refs ===
foo.c:9:21: note:   got vectype for stmt: aval_13 = *_3;
vector(4) short int
foo.c:9:21: note:   got vectype for stmt: _7 = *_6;
vector(2) int
foo.c:9:21: note:   === vect_analyze_scalar_cycles ===
foo.c:9:21: note:   Analyze phi: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   Access function of PHI: last_19
foo.c:9:21: note:   Analyze phi: i_21 = PHI <i_17(7), 0(15)>
foo.c:9:21: note:   Access function of PHI: {0, +, 1}_1
foo.c:9:21: note:   step: 1,  init: 0
foo.c:9:21: note:   Detected induction.
foo.c:9:21: note:   Analyze phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)>
foo.c:9:21: note:   Access function of PHI: {43, +, 4294967295}_1
foo.c:9:21: note:   step: 4294967295,  init: 43
foo.c:9:21: note:   Detected induction.
foo.c:9:21: note:   Analyze phi: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   reduction path: last_8 last_19 
foo.c:9:21: note:   reduction: detected reduction
foo.c:9:21: note:   Detected reduction.
foo.c:9:21: note:   === vect_determine_precisions ===
foo.c:9:21: note:   using boolean precision 32 for _9 = _7 < min_v_15(D);
foo.c:9:21: note:   ivtmp_10 has no range info
foo.c:9:21: note:   i_17 has range [0x1, 0x2b]
foo.c:9:21: note:   can narrow to unsigned:6 without loss of precision: i_17 = 
i_21 + 1;
foo.c:9:21: note:   last_8 has no range info
foo.c:9:21: note:   last_16 has no range info
foo.c:9:21: note:   _7 has no range info
foo.c:9:21: note:   _5 has range [0x0, 0xa8]
foo.c:9:21: note:   can narrow to unsigned:8 without loss of precision: _5 = _1 
* 4;
foo.c:9:21: note:   aval_13 has no range info
foo.c:9:21: note:   _2 has range [0x0, 0x54]
foo.c:9:21: note:   can narrow to unsigned:7 without loss of precision: _2 = _1 
* 2;
foo.c:9:21: note:   _1 has range [0x0, 0x2a]
foo.c:9:21: note:   === vect_pattern_recog ===
foo.c:9:21: note:   vect_is_simple_use: operand (long unsigned int) i_21, type 
of def: internal
foo.c:9:21: note:   vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, 
type of def: induction
foo.c:9:21: note:   vect_is_simple_use: operand (long unsigned int) i_21, type 
of def: internal
foo.c:9:21: note:   vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, 
type of def: induction
foo.c:9:21: note:   vect_recog_widen_mult_pattern: detected: _2 = _1 * 2;
foo.c:9:21: note:   vect_recog_mult_pattern: detected: _2 = _1 * 2;
foo.c:9:21: note:   mult pattern recognized: patt_48 = _1 << 1;
foo.c:9:21: note:   vect_is_simple_use: operand (long unsigned int) i_21, type 
of def: internal
foo.c:9:21: note:   vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, 
type of def: induction
foo.c:9:21: note:   vect_is_simple_use: operand (long unsigned int) i_21, type 
of def: internal
foo.c:9:21: note:   vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, 
type of def: induction
foo.c:9:21: note:   vect_recog_widen_mult_pattern: detected: _5 = _1 * 4;
foo.c:9:21: note:   vect_recog_mult_pattern: detected: _5 = _1 * 4;
foo.c:9:21: note:   mult pattern recognized: patt_49 = _1 << 2;
foo.c:9:21: note:   vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, 
type of def: induction
foo.c:9:21: note:   vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, 
type of def: induction
foo.c:9:21: note:   vect_is_simple_use: operand ivtmp_18 = PHI <ivtmp_10(7), 
43(15)>, type of def: induction
foo.c:9:21: note:   === vect_analyze_data_ref_accesses ===
foo.c:9:21: note:   === vect_mark_stmts_to_be_vectorized ===
foo.c:9:21: note:   init: phi relevant? last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   init: phi relevant? i_21 = PHI <i_17(7), 0(15)>
foo.c:9:21: note:   init: phi relevant? ivtmp_18 = PHI <ivtmp_10(7), 43(15)>
foo.c:9:21: note:   init: stmt relevant? _1 = (long unsigned int) i_21;
foo.c:9:21: note:   init: stmt relevant? _2 = _1 * 2;
foo.c:9:21: note:   init: stmt relevant? _3 = a_12(D) + _2;
foo.c:9:21: note:   init: stmt relevant? aval_13 = *_3;
foo.c:9:21: note:   init: stmt relevant? _5 = _1 * 4;
foo.c:9:21: note:   init: stmt relevant? _6 = b_14(D) + _5;
foo.c:9:21: note:   init: stmt relevant? _7 = *_6;
foo.c:9:21: note:   init: stmt relevant? last_16 = (int) aval_13;
foo.c:9:21: note:   init: stmt relevant? _9 = _7 < min_v_15(D);
foo.c:9:21: note:   init: stmt relevant? last_8 = _9 ? last_16 : last_19;
foo.c:9:21: note:   vec_stmt_relevant_p: used out of loop.
foo.c:9:21: note:   vect_is_simple_use: operand _7 < min_v_15(D), type of def: 
internal
foo.c:9:21: note:   vec_stmt_relevant_p: stmt live but not relevant.
foo.c:9:21: note:   mark relevant 1, live 1: last_8 = _9 ? last_16 : last_19;
foo.c:9:21: note:   init: stmt relevant? i_17 = i_21 + 1;
foo.c:9:21: note:   init: stmt relevant? ivtmp_10 = ivtmp_18 - 1;
foo.c:9:21: note:   init: stmt relevant? if (ivtmp_10 != 0)
foo.c:9:21: note:   worklist: examine stmt: last_8 = _9 ? last_16 : last_19;
foo.c:9:21: note:   vect_is_simple_use: operand _7 < min_v_15(D), type of def: 
internal
foo.c:9:21: note:   mark relevant 1, live 0: _9 = _7 < min_v_15(D);
foo.c:9:21: note:   vect_is_simple_use: operand (int) aval_13, type of def: 
internal
foo.c:9:21: note:   mark relevant 1, live 0: last_16 = (int) aval_13;
foo.c:9:21: note:   vect_is_simple_use: operand last_19 = PHI <last_8(7), 
108(15)>, type of def: reduction
foo.c:9:21: note:   mark relevant 1, live 0: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   worklist: examine stmt: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   vect_is_simple_use: operand _9 ? last_16 : last_19, type of 
def: reduction
foo.c:9:21: note:   reduc-stmt defining reduc-phi in the same nest.
foo.c:9:21: note:   mark relevant 1, live 1: last_8 = _9 ? last_16 : last_19;
foo.c:9:21: note:   already marked relevant/live.
foo.c:9:21: note:   vect_is_simple_use: operand 108, type of def: constant
foo.c:9:21: note:   worklist: examine stmt: last_16 = (int) aval_13;
foo.c:9:21: note:   vect_is_simple_use: operand *_3, type of def: internal
foo.c:9:21: note:   mark relevant 1, live 0: aval_13 = *_3;
foo.c:9:21: note:   worklist: examine stmt: aval_13 = *_3;
foo.c:9:21: note:   worklist: examine stmt: _9 = _7 < min_v_15(D);
foo.c:9:21: note:   vect_is_simple_use: operand *_6, type of def: internal
foo.c:9:21: note:   mark relevant 1, live 0: _7 = *_6;
foo.c:9:21: note:   vect_is_simple_use: operand min_v_15(D), type of def: 
external
foo.c:9:21: note:   worklist: examine stmt: _7 = *_6;
foo.c:9:21: note:   === vect_analyze_data_ref_dependences ===
foo.c:9:21: note:   === vect_determine_vectorization_factor ===
foo.c:9:21: note:   ==> examining phi: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   get vectype for scalar type:  int
foo.c:9:21: note:   vectype: vector(2) int
foo.c:9:21: note:   nunits = 2
foo.c:9:21: note:   ==> examining phi: i_21 = PHI <i_17(7), 0(15)>
foo.c:9:21: note:   ==> examining phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)>
foo.c:9:21: note:   ==> examining statement: _1 = (long unsigned int) i_21;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: _2 = _1 * 2;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining pattern statement: patt_48 = _1 << 1;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: _3 = a_12(D) + _2;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: aval_13 = *_3;
foo.c:9:21: note:   precomputed vectype: vector(4) short int
foo.c:9:21: note:   nunits = 4
foo.c:9:21: note:   ==> examining statement: _5 = _1 * 4;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining pattern statement: patt_49 = _1 << 2;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: _6 = b_14(D) + _5;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: _7 = *_6;
foo.c:9:21: note:   precomputed vectype: vector(2) int
foo.c:9:21: note:   nunits = 2
foo.c:9:21: note:   ==> examining statement: last_16 = (int) aval_13;
foo.c:9:21: note:   get vectype for scalar type: int
foo.c:9:21: note:   vectype: vector(2) int
foo.c:9:21: note:   get vectype for smallest scalar type: short int
foo.c:9:21: note:   nunits vectype: vector(4) short int
foo.c:9:21: note:   nunits = 4
foo.c:9:21: note:   ==> examining statement: _9 = _7 < min_v_15(D);
foo.c:9:21: note:   vectype: vector(2) <signed-boolean:32>
foo.c:9:21: note:   nunits = 2
foo.c:9:21: note:   ==> examining statement: last_8 = _9 ? last_16 : last_19;
foo.c:9:21: note:   get vectype for scalar type: int
foo.c:9:21: note:   vectype: vector(2) int
foo.c:9:21: note:   nunits = 2
foo.c:9:21: note:   ==> examining statement: i_17 = i_21 + 1;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: ivtmp_10 = ivtmp_18 - 1;
foo.c:9:21: note:   skip.
foo.c:9:21: note:   ==> examining statement: if (ivtmp_10 != 0)
foo.c:9:21: note:   skip.
foo.c:9:21: note:   vectorization factor = 4
foo.c:9:21: note:   === vect_compute_single_scalar_iteration_cost ===
*_3 1 times scalar_load costs 1 in prologue
*_6 1 times scalar_load costs 1 in prologue
(int) aval_13 1 times scalar_stmt costs 1 in prologue
_7 < min_v_15(D) 1 times scalar_stmt costs 1 in prologue
_9 ? last_16 : last_19 1 times scalar_stmt costs 1 in prologue
foo.c:9:21: note:   === vect_analyze_slp ===
foo.c:9:21: note:   === vect_make_slp_decision ===
foo.c:9:21: note:  vectorization_factor = 4, niters = 43
foo.c:9:21: note:   === vect_analyze_data_refs_alignment ===
foo.c:9:21: note:   recording new base alignment for a_12(D)
  alignment:    2
  misalignment: 0
  based on:     aval_13 = *_3;
foo.c:9:21: note:   recording new base alignment for b_14(D)
  alignment:    4
  misalignment: 0
  based on:     _7 = *_6;
foo.c:9:21: note:   vect_compute_data_ref_alignment:
foo.c:9:21: note:   can't force alignment of ref: *_3
foo.c:9:21: note:   vect_compute_data_ref_alignment:
foo.c:9:21: note:   can't force alignment of ref: *_6
foo.c:9:21: note:   === vect_prune_runtime_alias_test_list ===
foo.c:9:21: note:   === vect_dissolve_slp_only_groups ===
foo.c:9:21: note:   === vect_analyze_loop_operations ===
foo.c:9:21: note:   examining phi: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   vect_is_simple_use: operand (int) aval_13, type of def: 
internal
foo.c:9:21: note:   vect_is_simple_use: vectype vector(2) int
foo.c:9:21: note:   vect_is_simple_use: operand last_19 = PHI <last_8(7), 
108(15)>, type of def: reduction
foo.c:9:21: note:   vect_is_simple_use: vectype vector(2) int
foo.c:9:21: missed:   multiple types in double reduction or condition reduction 
or fold-left reduction.
foo.c:4:1: missed:   not vectorized: relevant phi not supported: last_19 = PHI 
<last_8(7), 108(15)>
foo.c:9:21: missed:  bad operation or unsupported loop bound.
foo.c:9:21: note:  ***** Analysis  failed with vector mode V2SI
foo.c:9:21: optimized: loop vectorized using 8 byte vectors
foo.c:9:21: note:  === vec_transform_loop ===
split exit edge
split exit edge of scalar loop
Removing basic block 19
;; basic block 19, loop depth 0
;;  pred:       16
;;  succ:      


foo.c:9:21: note:  vect_can_advance_ivs_p:
foo.c:9:21: note:  Analyze phi: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:  reduc or virtual phi. skip.
foo.c:9:21: note:  Analyze phi: i_21 = PHI <i_17(7), 0(15)>
foo.c:9:21: note:  Analyze phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)>
foo.c:9:21: note:  vect_update_ivs_after_vectorizer: phi: last_19 = PHI 
<last_8(7), 108(15)>
foo.c:9:21: note:  reduc or virtual phi. skip.
foo.c:9:21: note:  vect_update_ivs_after_vectorizer: phi: i_21 = PHI <i_17(7), 
0(15)>
foo.c:9:21: note:  vect_update_ivs_after_vectorizer: phi: ivtmp_18 = PHI 
<ivtmp_10(7), 43(15)>
;; Guessed iterations of loop 3 is 42.052870. New upper bound 2.
;; Scaling loop 3 with scale 7.0% (guessed) to reach upper bound 2
foo.c:9:21: note:  ------>vectorizing phi: last_19 = PHI <last_8(7), 108(25)>
foo.c:9:21: note:  transform phi.
foo.c:9:21: note:  ------>vectorizing phi: i_21 = PHI <i_17(7), 0(25)>
foo.c:9:21: note:  ------>vectorizing phi: ivtmp_18 = PHI <ivtmp_10(7), 43(25)>
foo.c:9:21: note:  ------>vectorizing phi: vect_last_19.7_67 = PHI <(7), { 108, 
108, 108, 108 }(25)>
foo.c:9:21: note:  ------>vectorizing statement: _1 = (long unsigned int) i_21;
foo.c:9:21: note:  ------>vectorizing statement: patt_40 = i_21 w* 2;
foo.c:9:21: note:  ------>vectorizing statement: patt_41 = (long unsigned int) 
patt_40;
foo.c:9:21: note:  ------>vectorizing statement: _3 = a_12(D) + _2;
foo.c:9:21: note:  ------>vectorizing statement: aval_13 = *_3;
foo.c:9:21: note:  transform statement.
foo.c:9:21: note:  transform load. ncopies = 1
foo.c:9:21: note:  create vector_type-pointer variable to type: vector(4) short 
int  vectorizing a pointer ref: *a_12(D)
foo.c:9:21: note:  created a_12(D)
foo.c:9:21: note:  add new stmt: vect_aval_13.10_70 = MEM <vector(4) short int> 
[(short int *)vectp_a.8_68];
foo.c:9:21: note:  ------>vectorizing statement: patt_42 = i_21 w* 4;
foo.c:9:21: note:  ------>vectorizing statement: patt_43 = (long unsigned int) 
patt_42;
foo.c:9:21: note:  ------>vectorizing statement: _6 = b_14(D) + _5;
foo.c:9:21: note:  ------>vectorizing statement: _7 = *_6;
foo.c:9:21: note:  transform statement.
foo.c:9:21: note:  transform load. ncopies = 1
foo.c:9:21: note:  create vector_type-pointer variable to type: vector(4) int  
vectorizing a pointer ref: *b_14(D)
foo.c:9:21: note:  created b_14(D)
foo.c:9:21: note:  add new stmt: vect__7.13_73 = MEM <vector(4) int> [(int 
*)vectp_b.11_71];
foo.c:9:21: note:  ------>vectorizing statement: last_16 = (int) aval_13;
foo.c:9:21: note:  transform statement.
foo.c:9:21: note:  vect_is_simple_use: operand *_3, type of def: internal
foo.c:9:21: note:  vect_is_simple_use: vectype vector(4) short int
foo.c:9:21: note:  transform conversion. ncopies = 1.
foo.c:9:21: note:  vect_get_vec_defs_for_operand: aval_13
foo.c:9:21: note:  vect_is_simple_use: operand *_3, type of def: internal
foo.c:9:21: note:    def_stmt =  aval_13 = *_3;
foo.c:9:21: note:  add new stmt: vect_last_16.14_74 = (vector(4) int) 
vect_aval_13.10_70;
foo.c:9:21: note:  ------>vectorizing statement: _9 = _7 < min_v_15(D);
foo.c:9:21: note:  transform statement.
foo.c:9:21: note:  vect_is_simple_use: operand *_6, type of def: internal
foo.c:9:21: note:  vect_is_simple_use: vectype vector(4) int
foo.c:9:21: note:  vect_is_simple_use: operand min_v_15(D), type of def: 
external
foo.c:9:21: note:  vect_get_vec_defs_for_operand: _7
foo.c:9:21: note:  vect_is_simple_use: operand *_6, type of def: internal
foo.c:9:21: note:    def_stmt =  _7 = *_6;
foo.c:9:21: note:  vect_get_vec_defs_for_operand: min_v_15(D)
foo.c:9:21: note:  vect_is_simple_use: operand min_v_15(D), type of def: 
external
foo.c:9:21: note:  created new init_stmt: vect_cst__75 = {min_v_15(D), 
min_v_15(D), min_v_15(D), min_v_15(D)};
foo.c:9:21: note:  add new stmt: mask__9.15_76 = vect__7.13_73 < vect_cst__75;
foo.c:9:21: note:  ------>vectorizing statement: last_8 = _9 ? last_16 : 
last_19;
foo.c:9:21: note:  transform statement.
foo.c:9:21: note:  vect_is_simple_use: operand _7 < min_v_15(D), type of def: 
internal
foo.c:9:21: note:  vect_is_simple_use: vectype vector(4) <signed-boolean:32>
foo.c:9:21: note:  vect_is_simple_use: operand (int) aval_13, type of def: 
internal
foo.c:9:21: note:  vect_is_simple_use: vectype vector(4) int
foo.c:9:21: note:  vect_is_simple_use: operand last_19 = PHI <last_8(7), 
108(25)>, type of def: reduction
foo.c:9:21: note:  vect_is_simple_use: vectype vector(4) int
foo.c:9:21: note:  vect_get_vec_defs_for_operand: _9
foo.c:9:21: note:  vect_is_simple_use: operand _7 < min_v_15(D), type of def: 
internal
foo.c:9:21: note:    def_stmt =  _9 = _7 < min_v_15(D);
foo.c:9:21: note:  vect_get_vec_defs_for_operand: last_16
foo.c:9:21: note:  vect_is_simple_use: operand (int) aval_13, type of def: 
internal
foo.c:9:21: note:    def_stmt =  last_16 = (int) aval_13;
foo.c:9:21: note:  vect_get_vec_defs_for_operand: last_19
foo.c:9:21: note:  vect_is_simple_use: operand last_19 = PHI <last_8(7), 
108(25)>, type of def: reduction
foo.c:9:21: note:    def_stmt =  last_19 = PHI <last_8(7), 108(25)>
foo.c:9:21: note:  add new stmt: vect_last_8.16_77 = VEC_COND_EXPR 
<mask__9.15_76, vect_last_16.14_74, vect_last_19.7_67>;
foo.c:9:21: note:  ------>vectorizing statement: i_17 = i_21 + 1;
foo.c:9:21: note:  ------>vectorizing statement: ivtmp_10 = ivtmp_18 - 1;
foo.c:9:21: note:  ------>vectorizing statement: if (ivtmp_10 != 0)
foo.c:9:21: note:  New loop exit condition: if (ivtmp_91 < 10)
;; Scaling loop 1 with scale 25.0% (adjusted)
;; Guessed iterations of loop 1 is 9.763217. New upper bound 9.
;; Scaling loop 1 with scale 92.9% (guessed) to reach upper bound 9
foo.c:9:21: note:  LOOP VECTORIZED

foo.c:4:1: note: vectorized 1 loops in function.
;; Created LCSSA PHI: _92 = PHI <_81(3)>

Updating SSA:
Registering new PHI nodes in block #3
Updating SSA information for statement _81 = VEC_COND_EXPR <mask__9.15_76, 
ivtmp_78, _80>;
Registering new PHI nodes in block #7
Registering new PHI nodes in block #20
Updating SSA information for statement _83 = .REDUC_MAX (_81);
Updating SSA information for statement _85 = _81 == _84;
Registering new PHI nodes in block #21

SSA replacement table
N_i -> { O_1 ... O_j } means that N_i replaces O_1, ..., O_j

_92 -> { _81 }
Incremental SSA update started at block: 3
Number of blocks in CFG: 26
Number of blocks to update: 3 ( 12%)
Affected blocks: 3 7 20


Processing block 0: BB25
Value numbering stmt = vect_cst__75 = {min_v_15(D), min_v_15(D), min_v_15(D), 
min_v_15(D)};
Setting value number of vect_cst__75 to vect_cst__75 (changed)
marking outgoing edge 25 -> 3 executable
Making available beyond BB25 vect_cst__75 for value vect_cst__75
Processing block 1: BB3
Cannot trust state of predecessor edge 7 -> 3, marking executable
Value numbering stmt = last_19 = PHI <last_8(7), 108(25)>
Setting value number of last_19 to last_19 (changed)
Making available beyond BB3 last_19 for value last_19
Value numbering stmt = i_21 = PHI <i_17(7), 0(25)>
Setting value number of i_21 to i_21 (changed)
Making available beyond BB3 i_21 for value i_21
Value numbering stmt = ivtmp_18 = PHI <ivtmp_10(7), 43(25)>
Setting value number of ivtmp_18 to ivtmp_18 (changed)
Making available beyond BB3 ivtmp_18 for value ivtmp_18
Value numbering stmt = vect_last_19.7_67 = PHI <vect_last_8.16_77(7), { 108, 
108, 108, 108 }(25)>
Setting value number of vect_last_19.7_67 to vect_last_19.7_67 (changed)
Making available beyond BB3 vect_last_19.7_67 for value vect_last_19.7_67
Value numbering stmt = vectp_a.8_68 = PHI <vectp_a.8_69(7), a_12(D)(25)>
Setting value number of vectp_a.8_68 to vectp_a.8_68 (changed)
Making available beyond BB3 vectp_a.8_68 for value vectp_a.8_68
Value numbering stmt = vectp_b.11_71 = PHI <vectp_b.11_72(7), b_14(D)(25)>
Setting value number of vectp_b.11_71 to vectp_b.11_71 (changed)
Making available beyond BB3 vectp_b.11_71 for value vectp_b.11_71
Value numbering stmt = ivtmp_78 = PHI <ivtmp_79(7), { 1, 2, 3, 4 }(25)>
Setting value number of ivtmp_78 to ivtmp_78 (changed)
Making available beyond BB3 ivtmp_78 for value ivtmp_78
Value numbering stmt = _80 = PHI <_81(7), { 0, 0, 0, 0 }(25)>
Setting value number of _80 to _80 (changed)
Making available beyond BB3 _80 for value _80
Value numbering stmt = ivtmp_90 = PHI <ivtmp_91(7), 0(25)>
Setting value number of ivtmp_90 to ivtmp_90 (changed)
Making available beyond BB3 ivtmp_90 for value ivtmp_90
Value numbering stmt = _1 = (long unsigned int) i_21;
Setting value number of _1 to _1 (changed)
Making available beyond BB3 _1 for value _1
Value numbering stmt = _2 = _1 * 2;
Setting value number of _2 to _2 (changed)
Making available beyond BB3 _2 for value _2
Value numbering stmt = _3 = a_12(D) + _2;
Setting value number of _3 to _3 (changed)
Making available beyond BB3 _3 for value _3
Value numbering stmt = vect_aval_13.10_70 = MEM <vector(4) short int> [(short 
int *)vectp_a.8_68];
Setting value number of vect_aval_13.10_70 to vect_aval_13.10_70 (changed)
Making available beyond BB3 vect_aval_13.10_70 for value vect_aval_13.10_70
Value numbering stmt = aval_13 = *_3;
Setting value number of aval_13 to aval_13 (changed)
Making available beyond BB3 aval_13 for value aval_13
Value numbering stmt = _5 = _1 * 4;
Setting value number of _5 to _5 (changed)
Making available beyond BB3 _5 for value _5
Value numbering stmt = _6 = b_14(D) + _5;
Setting value number of _6 to _6 (changed)
Making available beyond BB3 _6 for value _6
Value numbering stmt = vect__7.13_73 = MEM <vector(4) int> [(int 
*)vectp_b.11_71];
Setting value number of vect__7.13_73 to vect__7.13_73 (changed)
Making available beyond BB3 vect__7.13_73 for value vect__7.13_73
Value numbering stmt = _7 = *_6;
Setting value number of _7 to _7 (changed)
Making available beyond BB3 _7 for value _7
Value numbering stmt = vect_last_16.14_74 = (vector(4) int) vect_aval_13.10_70;
Setting value number of vect_last_16.14_74 to vect_last_16.14_74 (changed)
Making available beyond BB3 vect_last_16.14_74 for value vect_last_16.14_74
Value numbering stmt = last_16 = (int) aval_13;
Setting value number of last_16 to last_16 (changed)
Making available beyond BB3 last_16 for value last_16
Value numbering stmt = mask__9.15_76 = vect__7.13_73 < vect_cst__75;
Setting value number of mask__9.15_76 to mask__9.15_76 (changed)
Making available beyond BB3 mask__9.15_76 for value mask__9.15_76
Value numbering stmt = _9 = _7 < min_v_15(D);
Setting value number of _9 to _9 (changed)
Making available beyond BB3 _9 for value _9
Value numbering stmt = vect_last_8.16_77 = VEC_COND_EXPR <mask__9.15_76, 
vect_last_16.14_74, vect_last_19.7_67>;
Setting value number of vect_last_8.16_77 to vect_last_8.16_77 (changed)
Making available beyond BB3 vect_last_8.16_77 for value vect_last_8.16_77
Value numbering stmt = last_8 = _9 ? last_16 : last_19;
Setting value number of last_8 to last_8 (changed)
Making available beyond BB3 last_8 for value last_8
Value numbering stmt = i_17 = i_21 + 1;
Setting value number of i_17 to i_17 (changed)
Making available beyond BB3 i_17 for value i_17
Value numbering stmt = ivtmp_10 = ivtmp_18 - 1;
Setting value number of ivtmp_10 to ivtmp_10 (changed)
Making available beyond BB3 ivtmp_10 for value ivtmp_10
Value numbering stmt = vectp_a.8_69 = vectp_a.8_68 + 8;
Setting value number of vectp_a.8_69 to vectp_a.8_69 (changed)
Making available beyond BB3 vectp_a.8_69 for value vectp_a.8_69
Value numbering stmt = vectp_b.11_72 = vectp_b.11_71 + 16;
Setting value number of vectp_b.11_72 to vectp_b.11_72 (changed)
Making available beyond BB3 vectp_b.11_72 for value vectp_b.11_72
Value numbering stmt = _81 = VEC_COND_EXPR <mask__9.15_76, ivtmp_78, _80>;
Setting value number of _81 to _81 (changed)
Making available beyond BB3 _81 for value _81
Value numbering stmt = ivtmp_79 = ivtmp_78 + { 4, 4, 4, 4 };
Setting value number of ivtmp_79 to ivtmp_79 (changed)
Making available beyond BB3 ivtmp_79 for value ivtmp_79
Value numbering stmt = ivtmp_91 = ivtmp_90 + 1;
Setting value number of ivtmp_91 to ivtmp_91 (changed)
Making available beyond BB3 ivtmp_91 for value ivtmp_91
Value numbering stmt = if (ivtmp_91 < 10)
Recording on edge 3->7 ivtmp_91 lt_expr 10 == true
Recording on edge 3->7 ivtmp_91 ge_expr 10 == false
Recording on edge 3->7 ivtmp_91 ne_expr 10 == true
Recording on edge 3->7 ivtmp_91 le_expr 10 == true
Recording on edge 3->7 ivtmp_91 gt_expr 10 == false
Recording on edge 3->7 ivtmp_91 eq_expr 10 == false
marking outgoing edge 3 -> 7 executable
marking destination block 20 reachable
Processing block 2: BB7
RPO iteration over 3 blocks visited 3 blocks in total discovering 3 executable 
blocks iterating 1.0 times, a block was visited max. 1 times
RPO tracked 35 values available at 32 locations and 35 lattice elements
Removing basic block 9
;; basic block 9, loop depth 1
;;  pred:       16
;;              13
# last_23 = PHI <108(16), last_34(13)>
# i_24 = PHI <0(16), i_35(13)>
# ivtmp_25 = PHI <43(16), ivtmp_36(13)>
_26 = (long unsigned int) i_24;
_27 = _26 * 2;
_28 = a_12(D) + _27;
aval_29 = *_28;
_30 = _26 * 4;
_31 = b_14(D) + _30;
_32 = *_31;
if (_32 < min_v_15(D))
  goto <bb 11>; [50.00%]
else
  goto <bb 12>; [50.00%]
;;  succ:       11
;;              12


Removing basic block 11
;; basic block 11, loop depth 1
;;  pred:      
last_33 = (int) _29;
;;  succ:       12


Removing basic block 12
;; basic block 12, loop depth 1
;;  pred:      
# last_34 = PHI <>
i_35 = _24 + 1;
ivtmp_36 = _25 - 1;
if (ivtmp_36 != 0)
  goto <bb 13>; [97.68%]
else
  goto <bb 18>; [2.32%]
;;  succ:       13
;;              18


Removing basic block 13
;; basic block 13, loop depth 1
;;  pred:      
;;  succ:      


Removing basic block 16
;; basic block 16, loop depth 0
;;  pred:      
;;  succ:      


Removing basic block 18
;; basic block 18, loop depth 0
;;  pred:      
# last_51 = PHI <>
goto <bb 6>; [100.00%]
;;  succ:       6


Merging blocks 2 and 15
Merging blocks 17 and 6
Merging blocks 2 and 25
fix_loop_structure: fixing up loops for function
fix_loop_structure: removing loop 2
__attribute__((noipa, noinline, noclone, no_icf))
int condition_reduction (short int * a, int min_v, int * b)
{
  int stmp_last_8.17;
  vector(4) int vect_last_8.16;
  vector(4) <signed-boolean:32> mask__9.15;
  vector(4) int vect_last_16.14;
  vector(4) int vect__7.13;
  int * vectp_b.12;
  vector(4) int * vectp_b.11;
  vector(4) short int vect_aval_13.10;
  short int * vectp_a.9;
  vector(4) short int * vectp_a.8;
  vector(4) int vect_last_19.7;
  unsigned int tmp.6;
  int tmp.5;
  int i;
  short int aval;
  int last;
  long unsigned int _1;
  long unsigned int _2;
  short int * _3;
  long unsigned int _5;
  int * _6;
  int _7;
  _Bool _9;
  unsigned int ivtmp_10;
  unsigned int ivtmp_18;
  _Bool _22;
  unsigned int ivtmp_54;
  long unsigned int _55;
  long unsigned int _56;
  short int * _57;
  long unsigned int _59;
  int * _60;
  int _61;
  unsigned int ivtmp_64;
  vector(4) int vect_cst__75;
  vector(4) unsigned int ivtmp_78;
  vector(4) unsigned int ivtmp_79;
  vector(4) unsigned int _80;
  vector(4) unsigned int _81;
  unsigned int _83;
  vector(4) unsigned int _84;
  vector(4) <signed-boolean:32> _85;
  vector(4) int _86;
  vector(4) unsigned int _87;
  unsigned int _88;
  int _89;
  unsigned int ivtmp_90;
  unsigned int ivtmp_91;
  vector(4) unsigned int _92;

  <bb 2> [local count: 24373936]:
  _22 = 1;
  vect_cst__75 = {min_v_15(D), min_v_15(D), min_v_15(D), min_v_15(D)};

  <bb 3> [local count: 243739360]:
  # last_19 = PHI <last_8(7), 108(2)>
  # i_21 = PHI <i_17(7), 0(2)>
  # ivtmp_18 = PHI <ivtmp_10(7), 43(2)>
  # vect_last_19.7_67 = PHI <vect_last_8.16_77(7), { 108, 108, 108, 108 }(2)>
  # vectp_a.8_68 = PHI <vectp_a.8_69(7), a_12(D)(2)>
  # vectp_b.11_71 = PHI <vectp_b.11_72(7), b_14(D)(2)>
  # ivtmp_78 = PHI <ivtmp_79(7), { 1, 2, 3, 4 }(2)>
  # _80 = PHI <_81(7), { 0, 0, 0, 0 }(2)>
  # ivtmp_90 = PHI <ivtmp_91(7), 0(2)>
  _1 = (long unsigned int) i_21;
  _2 = _1 * 2;
  _3 = a_12(D) + _2;
  vect_aval_13.10_70 = MEM <vector(4) short int> [(short int *)vectp_a.8_68];
  aval_13 = *_3;
  _5 = _1 * 4;
  _6 = b_14(D) + _5;
  vect__7.13_73 = MEM <vector(4) int> [(int *)vectp_b.11_71];
  _7 = *_6;
  vect_last_16.14_74 = (vector(4) int) vect_aval_13.10_70;
  last_16 = (int) aval_13;
  mask__9.15_76 = vect__7.13_73 < vect_cst__75;
  _9 = _7 < min_v_15(D);
  vect_last_8.16_77 = VEC_COND_EXPR <mask__9.15_76, vect_last_16.14_74, 
vect_last_19.7_67>;
  last_8 = _9 ? last_16 : last_19;
  i_17 = i_21 + 1;
  ivtmp_10 = ivtmp_18 - 1;
  vectp_a.8_69 = vectp_a.8_68 + 8;
  vectp_b.11_72 = vectp_b.11_71 + 16;
  _81 = VEC_COND_EXPR <mask__9.15_76, ivtmp_78, _80>;
  ivtmp_79 = ivtmp_78 + { 4, 4, 4, 4 };
  ivtmp_91 = ivtmp_90 + 1;
  if (ivtmp_91 < 10)
    goto <bb 7>; [90.00%]
  else
    goto <bb 20>; [10.00%]

  <bb 7> [local count: 219365424]:
  goto <bb 3>; [100.00%]

  <bb 20> [local count: 24373936]:
  # last_66 = PHI <last_8(3)>
  # vect_last_8.16_82 = PHI <vect_last_8.16_77(3)>
  # _92 = PHI <_81(3)>
  _83 = .REDUC_MAX (_92);
  _84 = {_83, _83, _83, _83};
  _85 = _92 == _84;
  _86 = VEC_COND_EXPR <_85, vect_last_8.16_82, { 0, 0, 0, 0 }>;
  _87 = VIEW_CONVERT_EXPR<vector(4) unsigned int>(_86);
  _88 = .REDUC_MAX (_87);
  _89 = (int) _88;

  <bb 21> [local count: 73121805]:
  # last_52 = PHI <_89(20), last_62(22)>
  # i_53 = PHI <40(20), i_63(22)>
  # ivtmp_54 = PHI <3(20), ivtmp_64(22)>
  _55 = (long unsigned int) i_53;
  _56 = _55 * 2;
  _57 = a_12(D) + _56;
  aval_58 = *_57;
  _59 = _55 * 4;
  _60 = b_14(D) + _59;
  _61 = *_60;
  if (_61 < min_v_15(D))
    goto <bb 24>; [50.00%]
  else
    goto <bb 23>; [50.00%]

  <bb 22> [local count: 48747874]:
  goto <bb 21>; [100.00%]

  <bb 23> [local count: 73121805]:
  # last_62 = PHI <last_52(21), last_65(24)>
  i_63 = i_53 + 1;
  ivtmp_64 = ivtmp_54 - 1;
  if (ivtmp_64 != 0)
    goto <bb 22>; [66.67%]
  else
    goto <bb 17>; [33.33%]

  <bb 24> [local count: 36560903]:
  last_65 = (int) aval_58;
  goto <bb 23>; [100.00%]

  <bb 17> [local count: 24373936]:
  # last_50 = PHI <last_62(23)>
  return last_50;

}

Re: [PATCH][RFC] tree-optimization/92335 - Improve sinking heuristics for vectorization

Reply via email to