On Thu, 3 Aug 2023 at 17:48, Richard Biener <rguent...@suse.de> wrote: > > On Thu, 3 Aug 2023, Richard Biener wrote: > > > On Thu, 3 Aug 2023, Richard Biener wrote: > > > > > On Thu, 3 Aug 2023, Prathamesh Kulkarni wrote: > > > > > > > On Wed, 2 Aug 2023 at 14:17, Richard Biener via Gcc-patches > > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > > > On Mon, 31 Jul 2023, Jeff Law wrote: > > > > > > > > > > > > > > > > > > > > > > > On 7/28/23 01:05, Richard Biener via Gcc-patches wrote: > > > > > > > The following delays sinking of loads within the same innermost > > > > > > > loop when it was unconditional before. That's a not uncommon > > > > > > > issue preventing vectorization when masked loads are not > > > > > > > available. > > > > > > > > > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > > > > > > > > > > > > > I have a followup patch improving sinking that without this would > > > > > > > cause more of the problematic sinking - now that we have a second > > > > > > > sink pass after loop opts this looks like a reasonable approach? > > > > > > > > > > > > > > OK? > > > > > > > > > > > > > > Thanks, > > > > > > > Richard. > > > > > > > > > > > > > > PR tree-optimization/92335 > > > > > > > * tree-ssa-sink.cc (select_best_block): Before loop > > > > > > > optimizations avoid sinking unconditional loads/stores > > > > > > > in innermost loops to conditional executed places. > > > > > > > > > > > > > > * gcc.dg/tree-ssa/ssa-sink-10.c: Disable vectorizing. > > > > > > > * gcc.dg/tree-ssa/predcom-9.c: Clone from ssa-sink-10.c, > > > > > > > expect predictive commoning to happen instead of sinking. > > > > > > > * gcc.dg/vect/pr65947-3.c: Adjust. > > > > > > I think it's reasonable -- there's probably going to be cases where > > > > > > it's not > > > > > > great, but more often than not I think it's going to be a reasonable > > > > > > heuristic. > > > > > > > > > > > > If there is undesirable fallout, better to find it over the coming > > > > > > months than > > > > > > next spring. So I'd suggest we go forward now to give more time to > > > > > > find any > > > > > > pathological cases (if they exist). > > > > > > > > > > Agreed, I've pushed this now. > > > > Hi Richard, > > > > After this patch (committed in > > > > 399c8dd44ff44f4b496223c7cc980651c4d6f6a0), > > > > pr65947-7.c "failed" for aarch64-linux-gnu: > > > > FAIL: gcc.dg/vect/pr65947-7.c scan-tree-dump-not vect "LOOP VECTORIZED" > > > > FAIL: gcc.dg/vect/pr65947-7.c -flto -ffat-lto-objects > > > > scan-tree-dump-not vect "LOOP VECTORIZED" > > > > > > > > /* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { > > > > ! vect_fold_extract_last } } } } */ > > > > > > > > With your commit, condition_reduction in pr65947-7.c gets vectorized > > > > regardless of vect_fold_extract_last, > > > > which gates the above test (which is an improvement, because the > > > > function didn't get vectorized before the commit). > > > > > > > > The attached patch thus removes the gating on vect_fold_extract_last, > > > > and the test passes again. > > > > OK to commit ? > > > > > > OK. > > > > Or wait - the loop doesn't vectorize on x86_64, so I guess one > > critical target condition is missing. Can you figure out which? > > I see > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21: > note: vect_is_simple_use: operand last_19 = PHI <last_8(7), 108(15)>, > type of def: reduction > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21: > note: vect_is_simple_use: vectype vector(4) int > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21: > missed: multiple types in double reduction or condition reduction or > fold-left reduction. > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:13:1: > missed: not vectorized: relevant phi not supported: last_19 = PHI > <last_8(7), 108(15)> > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21: > missed: bad operation or unsupported loop bound. Hi Richard, Looking at the aarch64 vect dump, it seems the loop in condition_reduction gets vectorized with V4HI mode while fails for other modes in vectorizable_condition:
if ((double_reduc || reduction_type != TREE_CODE_REDUCTION) && ncopies > 1) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "multiple types in double reduction or condition " "reduction or fold-left reduction.\n"); return false; } >From the dump: foo.c:9:21: note: === vect_analyze_loop_operations === foo.c:9:21: note: examining phi: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: vect_is_simple_use: operand (int) aval_13, type of def: internal foo.c:9:21: note: vect_is_simple_use: vectype vector(4) int foo.c:9:21: note: vect_is_simple_use: operand last_19 = PHI <last_8(7), 108(15)>, type of def: reduction foo.c:9:21: note: vect_is_simple_use: vectype vector(4) int For V8HI, VF = 8, and vectype_in = vector(4) int. Thus ncopies = VF / length(vectype_in) = 2, which is greater than 1, and thus fails: foo.c:9:21: missed: multiple types in double reduction or condition reduction or fold-left reduction. foo.c:4:1: missed: not vectorized: relevant phi not supported: last_19 = PHI <last_8(7), 108(15)> While for V4HI, VF = 4 and thus ncopies = 1, so it succeeds. For x86_64, it seems the vectorizer doesn't seem to try V4HI mode. If I "force" the vectorizer to use V4HI mode, we get the following dump: foo.c:9:21: note: === vect_analyze_loop_operations === foo.c:9:21: note: examining phi: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: vect_is_simple_use: operand (int) aval_13, type of def: internal foo.c:9:21: note: vect_is_simple_use: vectype vector(2) int foo.c:9:21: note: vect_is_simple_use: operand last_19 = PHI <last_8(7), 108(15)>, type of def: reduction foo.c:9:21: note: vect_is_simple_use: vectype vector(2) int foo.c:9:21: missed: multiple types in double reduction or condition reduction or fold-left reduction. Not sure tho if this is the only reason for the test to fail to vectorize on the target. Will investigate in more details next week. Thanks, Prathamesh > > Richard.
;; Function condition_reduction (condition_reduction, funcdef_no=0, decl_uid=4390, cgraph_uid=1, symbol_order=0) Analyzing loop at foo.c:9 foo.c:9:21: note: === analyze_loop_nest === foo.c:9:21: note: === vect_analyze_loop_form === foo.c:9:21: note: === get_loop_niters === Analyzing # of iterations of loop 1 exit condition [42, + , 4294967295] != 0 bounds on difference of bases: -42 ... -42 result: # of iterations 42, bounded by 42 Creating dr for *_3 analyze_innermost: success. base_address: a_12(D) offset from base address: 0 constant offset from base address: 0 step: 2 base alignment: 2 base misalignment: 0 offset alignment: 128 step alignment: 2 base_object: *a_12(D) Access function 0: {0B, +, 2}_1 Creating dr for *_6 analyze_innermost: success. base_address: b_14(D) offset from base address: 0 constant offset from base address: 0 step: 4 base alignment: 4 base misalignment: 0 offset alignment: 128 step alignment: 4 base_object: *b_14(D) Access function 0: {0B, +, 4}_1 foo.c:9:21: note: === vect_analyze_data_refs === foo.c:9:21: note: got vectype for stmt: aval_13 = *_3; vector(8) short int foo.c:9:21: note: got vectype for stmt: _7 = *_6; vector(4) int foo.c:9:21: note: === vect_analyze_scalar_cycles === foo.c:9:21: note: Analyze phi: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: Access function of PHI: last_19 foo.c:9:21: note: Analyze phi: i_21 = PHI <i_17(7), 0(15)> foo.c:9:21: note: Access function of PHI: {0, +, 1}_1 foo.c:9:21: note: step: 1, init: 0 foo.c:9:21: note: Detected induction. foo.c:9:21: note: Analyze phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)> foo.c:9:21: note: Access function of PHI: {43, +, 4294967295}_1 foo.c:9:21: note: step: 4294967295, init: 43 foo.c:9:21: note: Detected induction. foo.c:9:21: note: Analyze phi: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: reduction path: last_8 last_19 foo.c:9:21: note: reduction: detected reduction foo.c:9:21: note: Detected reduction. foo.c:9:21: note: === vect_determine_precisions === foo.c:9:21: note: using boolean precision 32 for _9 = _7 < min_v_15(D); foo.c:9:21: note: ivtmp_10 has no range info foo.c:9:21: note: i_17 has range [0x1, 0x2b] foo.c:9:21: note: can narrow to unsigned:6 without loss of precision: i_17 = i_21 + 1; foo.c:9:21: note: last_8 has no range info foo.c:9:21: note: last_16 has no range info foo.c:9:21: note: _7 has no range info foo.c:9:21: note: _5 has range [0x0, 0xa8] foo.c:9:21: note: can narrow to unsigned:8 without loss of precision: _5 = _1 * 4; foo.c:9:21: note: aval_13 has no range info foo.c:9:21: note: _2 has range [0x0, 0x54] foo.c:9:21: note: can narrow to unsigned:7 without loss of precision: _2 = _1 * 2; foo.c:9:21: note: _1 has range [0x0, 0x2a] foo.c:9:21: note: === vect_pattern_recog === foo.c:9:21: note: vect_is_simple_use: operand (long unsigned int) i_21, type of def: internal foo.c:9:21: note: vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, type of def: induction foo.c:9:21: note: vect_is_simple_use: operand (long unsigned int) i_21, type of def: internal foo.c:9:21: note: vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, type of def: induction foo.c:9:21: note: vect_recog_widen_mult_pattern: detected: _2 = _1 * 2; foo.c:9:21: note: widen_mult pattern recognized: patt_37 = (long unsigned int) patt_4; foo.c:9:21: note: extra pattern stmt: patt_4 = i_21 w* 2; foo.c:9:21: note: vect_is_simple_use: operand (long unsigned int) i_21, type of def: internal foo.c:9:21: note: vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, type of def: induction foo.c:9:21: note: vect_is_simple_use: operand (long unsigned int) i_21, type of def: internal foo.c:9:21: note: vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, type of def: induction foo.c:9:21: note: vect_recog_widen_mult_pattern: detected: _5 = _1 * 4; foo.c:9:21: note: widen_mult pattern recognized: patt_39 = (long unsigned int) patt_38; foo.c:9:21: note: extra pattern stmt: patt_38 = i_21 w* 4; foo.c:9:21: note: vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, type of def: induction foo.c:9:21: note: vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, type of def: induction foo.c:9:21: note: vect_is_simple_use: operand ivtmp_18 = PHI <ivtmp_10(7), 43(15)>, type of def: induction foo.c:9:21: note: === vect_analyze_data_ref_accesses === foo.c:9:21: note: === vect_mark_stmts_to_be_vectorized === foo.c:9:21: note: init: phi relevant? last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: init: phi relevant? i_21 = PHI <i_17(7), 0(15)> foo.c:9:21: note: init: phi relevant? ivtmp_18 = PHI <ivtmp_10(7), 43(15)> foo.c:9:21: note: init: stmt relevant? _1 = (long unsigned int) i_21; foo.c:9:21: note: init: stmt relevant? _2 = _1 * 2; foo.c:9:21: note: init: stmt relevant? _3 = a_12(D) + _2; foo.c:9:21: note: init: stmt relevant? aval_13 = *_3; foo.c:9:21: note: init: stmt relevant? _5 = _1 * 4; foo.c:9:21: note: init: stmt relevant? _6 = b_14(D) + _5; foo.c:9:21: note: init: stmt relevant? _7 = *_6; foo.c:9:21: note: init: stmt relevant? last_16 = (int) aval_13; foo.c:9:21: note: init: stmt relevant? _9 = _7 < min_v_15(D); foo.c:9:21: note: init: stmt relevant? last_8 = _9 ? last_16 : last_19; foo.c:9:21: note: vec_stmt_relevant_p: used out of loop. foo.c:9:21: note: vect_is_simple_use: operand _7 < min_v_15(D), type of def: internal foo.c:9:21: note: vec_stmt_relevant_p: stmt live but not relevant. foo.c:9:21: note: mark relevant 1, live 1: last_8 = _9 ? last_16 : last_19; foo.c:9:21: note: init: stmt relevant? i_17 = i_21 + 1; foo.c:9:21: note: init: stmt relevant? ivtmp_10 = ivtmp_18 - 1; foo.c:9:21: note: init: stmt relevant? if (ivtmp_10 != 0) foo.c:9:21: note: worklist: examine stmt: last_8 = _9 ? last_16 : last_19; foo.c:9:21: note: vect_is_simple_use: operand _7 < min_v_15(D), type of def: internal foo.c:9:21: note: mark relevant 1, live 0: _9 = _7 < min_v_15(D); foo.c:9:21: note: vect_is_simple_use: operand (int) aval_13, type of def: internal foo.c:9:21: note: mark relevant 1, live 0: last_16 = (int) aval_13; foo.c:9:21: note: vect_is_simple_use: operand last_19 = PHI <last_8(7), 108(15)>, type of def: reduction foo.c:9:21: note: mark relevant 1, live 0: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: worklist: examine stmt: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: vect_is_simple_use: operand _9 ? last_16 : last_19, type of def: reduction foo.c:9:21: note: reduc-stmt defining reduc-phi in the same nest. foo.c:9:21: note: mark relevant 1, live 1: last_8 = _9 ? last_16 : last_19; foo.c:9:21: note: already marked relevant/live. foo.c:9:21: note: vect_is_simple_use: operand 108, type of def: constant foo.c:9:21: note: worklist: examine stmt: last_16 = (int) aval_13; foo.c:9:21: note: vect_is_simple_use: operand *_3, type of def: internal foo.c:9:21: note: mark relevant 1, live 0: aval_13 = *_3; foo.c:9:21: note: worklist: examine stmt: aval_13 = *_3; foo.c:9:21: note: worklist: examine stmt: _9 = _7 < min_v_15(D); foo.c:9:21: note: vect_is_simple_use: operand *_6, type of def: internal foo.c:9:21: note: mark relevant 1, live 0: _7 = *_6; foo.c:9:21: note: vect_is_simple_use: operand min_v_15(D), type of def: external foo.c:9:21: note: worklist: examine stmt: _7 = *_6; foo.c:9:21: note: === vect_analyze_data_ref_dependences === foo.c:9:21: note: === vect_determine_vectorization_factor === foo.c:9:21: note: ==> examining phi: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: get vectype for scalar type: int foo.c:9:21: note: vectype: vector(4) int foo.c:9:21: note: nunits = 4 foo.c:9:21: note: ==> examining phi: i_21 = PHI <i_17(7), 0(15)> foo.c:9:21: note: ==> examining phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)> foo.c:9:21: note: ==> examining statement: _1 = (long unsigned int) i_21; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: _2 = _1 * 2; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining pattern def stmt: patt_4 = i_21 w* 2; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining pattern statement: patt_37 = (long unsigned int) patt_4; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: _3 = a_12(D) + _2; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: aval_13 = *_3; foo.c:9:21: note: precomputed vectype: vector(8) short int foo.c:9:21: note: nunits = 8 foo.c:9:21: note: ==> examining statement: _5 = _1 * 4; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining pattern def stmt: patt_38 = i_21 w* 4; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining pattern statement: patt_39 = (long unsigned int) patt_38; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: _6 = b_14(D) + _5; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: _7 = *_6; foo.c:9:21: note: precomputed vectype: vector(4) int foo.c:9:21: note: nunits = 4 foo.c:9:21: note: ==> examining statement: last_16 = (int) aval_13; foo.c:9:21: note: get vectype for scalar type: int foo.c:9:21: note: vectype: vector(4) int foo.c:9:21: note: get vectype for smallest scalar type: short int foo.c:9:21: note: nunits vectype: vector(8) short int foo.c:9:21: note: nunits = 8 foo.c:9:21: note: ==> examining statement: _9 = _7 < min_v_15(D); foo.c:9:21: note: vectype: vector(4) <signed-boolean:32> foo.c:9:21: note: nunits = 4 foo.c:9:21: note: ==> examining statement: last_8 = _9 ? last_16 : last_19; foo.c:9:21: note: get vectype for scalar type: int foo.c:9:21: note: vectype: vector(4) int foo.c:9:21: note: nunits = 4 foo.c:9:21: note: ==> examining statement: i_17 = i_21 + 1; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: ivtmp_10 = ivtmp_18 - 1; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: if (ivtmp_10 != 0) foo.c:9:21: note: skip. foo.c:9:21: note: vectorization factor = 8 foo.c:9:21: note: === vect_compute_single_scalar_iteration_cost === *_3 1 times scalar_load costs 1 in prologue *_6 1 times scalar_load costs 1 in prologue (int) aval_13 1 times scalar_stmt costs 1 in prologue _7 < min_v_15(D) 1 times scalar_stmt costs 1 in prologue _9 ? last_16 : last_19 1 times scalar_stmt costs 1 in prologue foo.c:9:21: note: === vect_analyze_slp === foo.c:9:21: note: === vect_make_slp_decision === foo.c:9:21: note: vectorization_factor = 8, niters = 43 foo.c:9:21: note: === vect_analyze_data_refs_alignment === foo.c:9:21: note: recording new base alignment for a_12(D) alignment: 2 misalignment: 0 based on: aval_13 = *_3; foo.c:9:21: note: recording new base alignment for b_14(D) alignment: 4 misalignment: 0 based on: _7 = *_6; foo.c:9:21: note: vect_compute_data_ref_alignment: foo.c:9:21: note: can't force alignment of ref: *_3 foo.c:9:21: note: vect_compute_data_ref_alignment: foo.c:9:21: note: can't force alignment of ref: *_6 foo.c:9:21: note: === vect_prune_runtime_alias_test_list === foo.c:9:21: note: === vect_enhance_data_refs_alignment === foo.c:9:21: missed: Unknown misalignment, naturally aligned foo.c:9:21: missed: Unknown misalignment, naturally aligned foo.c:9:21: note: vect_can_advance_ivs_p: foo.c:9:21: note: Analyze phi: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: reduc or virtual phi. skip. foo.c:9:21: note: Analyze phi: i_21 = PHI <i_17(7), 0(15)> foo.c:9:21: note: Analyze phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)> foo.c:9:21: note: vect_model_load_cost: aligned. foo.c:9:21: note: vect_get_data_access_cost: inside_cost = 1, outside_cost = 0. foo.c:9:21: note: vect_model_load_cost: unaligned supported by hardware. foo.c:9:21: note: vect_get_data_access_cost: inside_cost = 3, outside_cost = 0. foo.c:9:21: note: vect_model_load_cost: unaligned supported by hardware. foo.c:9:21: note: vect_get_data_access_cost: inside_cost = 1, outside_cost = 0. foo.c:9:21: note: vect_model_load_cost: unaligned supported by hardware. foo.c:9:21: note: vect_get_data_access_cost: inside_cost = 3, outside_cost = 0. foo.c:9:21: note: === vect_dissolve_slp_only_groups === foo.c:9:21: note: === vect_analyze_loop_operations === foo.c:9:21: note: examining phi: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: vect_is_simple_use: operand (int) aval_13, type of def: internal foo.c:9:21: note: vect_is_simple_use: vectype vector(4) int foo.c:9:21: note: vect_is_simple_use: operand last_19 = PHI <last_8(7), 108(15)>, type of def: reduction foo.c:9:21: note: vect_is_simple_use: vectype vector(4) int foo.c:9:21: missed: multiple types in double reduction or condition reduction or fold-left reduction. foo.c:4:1: missed: not vectorized: relevant phi not supported: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: missed: bad operation or unsupported loop bound. foo.c:9:21: note: ***** Analysis failed with vector mode V8HI foo.c:9:21: note: ***** The result for vector mode V16QI would be the same foo.c:9:21: note: ***** The result for vector mode V8QI would be the same foo.c:9:21: note: ***** Re-trying analysis with vector mode V4HI foo.c:9:21: note: === vect_analyze_data_refs === foo.c:9:21: note: got vectype for stmt: aval_13 = *_3; vector(4) short int foo.c:9:21: note: got vectype for stmt: _7 = *_6; vector(4) int foo.c:9:21: note: === vect_analyze_scalar_cycles === foo.c:9:21: note: Analyze phi: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: Access function of PHI: last_19 foo.c:9:21: note: Analyze phi: i_21 = PHI <i_17(7), 0(15)> foo.c:9:21: note: Access function of PHI: {0, +, 1}_1 foo.c:9:21: note: step: 1, init: 0 foo.c:9:21: note: Detected induction. foo.c:9:21: note: Analyze phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)> foo.c:9:21: note: Access function of PHI: {43, +, 4294967295}_1 foo.c:9:21: note: step: 4294967295, init: 43 foo.c:9:21: note: Detected induction. foo.c:9:21: note: Analyze phi: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: reduction path: last_8 last_19 foo.c:9:21: note: reduction: detected reduction foo.c:9:21: note: Detected reduction. foo.c:9:21: note: === vect_determine_precisions === foo.c:9:21: note: using boolean precision 32 for _9 = _7 < min_v_15(D); foo.c:9:21: note: ivtmp_10 has no range info foo.c:9:21: note: i_17 has range [0x1, 0x2b] foo.c:9:21: note: can narrow to unsigned:6 without loss of precision: i_17 = i_21 + 1; foo.c:9:21: note: last_8 has no range info foo.c:9:21: note: last_16 has no range info foo.c:9:21: note: _7 has no range info foo.c:9:21: note: _5 has range [0x0, 0xa8] foo.c:9:21: note: can narrow to unsigned:8 without loss of precision: _5 = _1 * 4; foo.c:9:21: note: aval_13 has no range info foo.c:9:21: note: _2 has range [0x0, 0x54] foo.c:9:21: note: can narrow to unsigned:7 without loss of precision: _2 = _1 * 2; foo.c:9:21: note: _1 has range [0x0, 0x2a] foo.c:9:21: note: === vect_pattern_recog === foo.c:9:21: note: vect_is_simple_use: operand (long unsigned int) i_21, type of def: internal foo.c:9:21: note: vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, type of def: induction foo.c:9:21: note: vect_is_simple_use: operand (long unsigned int) i_21, type of def: internal foo.c:9:21: note: vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, type of def: induction foo.c:9:21: note: vect_recog_widen_mult_pattern: detected: _2 = _1 * 2; foo.c:9:21: note: widen_mult pattern recognized: patt_41 = (long unsigned int) patt_40; foo.c:9:21: note: extra pattern stmt: patt_40 = i_21 w* 2; foo.c:9:21: note: vect_is_simple_use: operand (long unsigned int) i_21, type of def: internal foo.c:9:21: note: vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, type of def: induction foo.c:9:21: note: vect_is_simple_use: operand (long unsigned int) i_21, type of def: internal foo.c:9:21: note: vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, type of def: induction foo.c:9:21: note: vect_recog_widen_mult_pattern: detected: _5 = _1 * 4; foo.c:9:21: note: widen_mult pattern recognized: patt_43 = (long unsigned int) patt_42; foo.c:9:21: note: extra pattern stmt: patt_42 = i_21 w* 4; foo.c:9:21: note: vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, type of def: induction foo.c:9:21: note: vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, type of def: induction foo.c:9:21: note: vect_is_simple_use: operand ivtmp_18 = PHI <ivtmp_10(7), 43(15)>, type of def: induction foo.c:9:21: note: === vect_analyze_data_ref_accesses === foo.c:9:21: note: === vect_mark_stmts_to_be_vectorized === foo.c:9:21: note: init: phi relevant? last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: init: phi relevant? i_21 = PHI <i_17(7), 0(15)> foo.c:9:21: note: init: phi relevant? ivtmp_18 = PHI <ivtmp_10(7), 43(15)> foo.c:9:21: note: init: stmt relevant? _1 = (long unsigned int) i_21; foo.c:9:21: note: init: stmt relevant? _2 = _1 * 2; foo.c:9:21: note: init: stmt relevant? _3 = a_12(D) + _2; foo.c:9:21: note: init: stmt relevant? aval_13 = *_3; foo.c:9:21: note: init: stmt relevant? _5 = _1 * 4; foo.c:9:21: note: init: stmt relevant? _6 = b_14(D) + _5; foo.c:9:21: note: init: stmt relevant? _7 = *_6; foo.c:9:21: note: init: stmt relevant? last_16 = (int) aval_13; foo.c:9:21: note: init: stmt relevant? _9 = _7 < min_v_15(D); foo.c:9:21: note: init: stmt relevant? last_8 = _9 ? last_16 : last_19; foo.c:9:21: note: vec_stmt_relevant_p: used out of loop. foo.c:9:21: note: vect_is_simple_use: operand _7 < min_v_15(D), type of def: internal foo.c:9:21: note: vec_stmt_relevant_p: stmt live but not relevant. foo.c:9:21: note: mark relevant 1, live 1: last_8 = _9 ? last_16 : last_19; foo.c:9:21: note: init: stmt relevant? i_17 = i_21 + 1; foo.c:9:21: note: init: stmt relevant? ivtmp_10 = ivtmp_18 - 1; foo.c:9:21: note: init: stmt relevant? if (ivtmp_10 != 0) foo.c:9:21: note: worklist: examine stmt: last_8 = _9 ? last_16 : last_19; foo.c:9:21: note: vect_is_simple_use: operand _7 < min_v_15(D), type of def: internal foo.c:9:21: note: mark relevant 1, live 0: _9 = _7 < min_v_15(D); foo.c:9:21: note: vect_is_simple_use: operand (int) aval_13, type of def: internal foo.c:9:21: note: mark relevant 1, live 0: last_16 = (int) aval_13; foo.c:9:21: note: vect_is_simple_use: operand last_19 = PHI <last_8(7), 108(15)>, type of def: reduction foo.c:9:21: note: mark relevant 1, live 0: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: worklist: examine stmt: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: vect_is_simple_use: operand _9 ? last_16 : last_19, type of def: reduction foo.c:9:21: note: reduc-stmt defining reduc-phi in the same nest. foo.c:9:21: note: mark relevant 1, live 1: last_8 = _9 ? last_16 : last_19; foo.c:9:21: note: already marked relevant/live. foo.c:9:21: note: vect_is_simple_use: operand 108, type of def: constant foo.c:9:21: note: worklist: examine stmt: last_16 = (int) aval_13; foo.c:9:21: note: vect_is_simple_use: operand *_3, type of def: internal foo.c:9:21: note: mark relevant 1, live 0: aval_13 = *_3; foo.c:9:21: note: worklist: examine stmt: aval_13 = *_3; foo.c:9:21: note: worklist: examine stmt: _9 = _7 < min_v_15(D); foo.c:9:21: note: vect_is_simple_use: operand *_6, type of def: internal foo.c:9:21: note: mark relevant 1, live 0: _7 = *_6; foo.c:9:21: note: vect_is_simple_use: operand min_v_15(D), type of def: external foo.c:9:21: note: worklist: examine stmt: _7 = *_6; foo.c:9:21: note: === vect_analyze_data_ref_dependences === foo.c:9:21: note: === vect_determine_vectorization_factor === foo.c:9:21: note: ==> examining phi: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: get vectype for scalar type: int foo.c:9:21: note: vectype: vector(4) int foo.c:9:21: note: nunits = 4 foo.c:9:21: note: ==> examining phi: i_21 = PHI <i_17(7), 0(15)> foo.c:9:21: note: ==> examining phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)> foo.c:9:21: note: ==> examining statement: _1 = (long unsigned int) i_21; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: _2 = _1 * 2; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining pattern def stmt: patt_40 = i_21 w* 2; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining pattern statement: patt_41 = (long unsigned int) patt_40; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: _3 = a_12(D) + _2; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: aval_13 = *_3; foo.c:9:21: note: precomputed vectype: vector(4) short int foo.c:9:21: note: nunits = 4 foo.c:9:21: note: ==> examining statement: _5 = _1 * 4; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining pattern def stmt: patt_42 = i_21 w* 4; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining pattern statement: patt_43 = (long unsigned int) patt_42; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: _6 = b_14(D) + _5; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: _7 = *_6; foo.c:9:21: note: precomputed vectype: vector(4) int foo.c:9:21: note: nunits = 4 foo.c:9:21: note: ==> examining statement: last_16 = (int) aval_13; foo.c:9:21: note: get vectype for scalar type: int foo.c:9:21: note: vectype: vector(4) int foo.c:9:21: note: get vectype for smallest scalar type: short int foo.c:9:21: note: nunits vectype: vector(4) short int foo.c:9:21: note: nunits = 4 foo.c:9:21: note: ==> examining statement: _9 = _7 < min_v_15(D); foo.c:9:21: note: vectype: vector(4) <signed-boolean:32> foo.c:9:21: note: nunits = 4 foo.c:9:21: note: ==> examining statement: last_8 = _9 ? last_16 : last_19; foo.c:9:21: note: get vectype for scalar type: int foo.c:9:21: note: vectype: vector(4) int foo.c:9:21: note: nunits = 4 foo.c:9:21: note: ==> examining statement: i_17 = i_21 + 1; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: ivtmp_10 = ivtmp_18 - 1; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: if (ivtmp_10 != 0) foo.c:9:21: note: skip. foo.c:9:21: note: vectorization factor = 4 foo.c:9:21: note: === vect_compute_single_scalar_iteration_cost === *_3 1 times scalar_load costs 1 in prologue *_6 1 times scalar_load costs 1 in prologue (int) aval_13 1 times scalar_stmt costs 1 in prologue _7 < min_v_15(D) 1 times scalar_stmt costs 1 in prologue _9 ? last_16 : last_19 1 times scalar_stmt costs 1 in prologue foo.c:9:21: note: === vect_analyze_slp === foo.c:9:21: note: === vect_make_slp_decision === foo.c:9:21: note: vectorization_factor = 4, niters = 43 foo.c:9:21: note: === vect_analyze_data_refs_alignment === foo.c:9:21: note: recording new base alignment for a_12(D) alignment: 2 misalignment: 0 based on: aval_13 = *_3; foo.c:9:21: note: recording new base alignment for b_14(D) alignment: 4 misalignment: 0 based on: _7 = *_6; foo.c:9:21: note: vect_compute_data_ref_alignment: foo.c:9:21: note: can't force alignment of ref: *_3 foo.c:9:21: note: vect_compute_data_ref_alignment: foo.c:9:21: note: can't force alignment of ref: *_6 foo.c:9:21: note: === vect_prune_runtime_alias_test_list === foo.c:9:21: note: === vect_enhance_data_refs_alignment === foo.c:9:21: missed: Unknown misalignment, naturally aligned foo.c:9:21: missed: Unknown misalignment, naturally aligned foo.c:9:21: note: vect_can_advance_ivs_p: foo.c:9:21: note: Analyze phi: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: reduc or virtual phi. skip. foo.c:9:21: note: Analyze phi: i_21 = PHI <i_17(7), 0(15)> foo.c:9:21: note: Analyze phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)> foo.c:9:21: note: vect_model_load_cost: aligned. foo.c:9:21: note: vect_get_data_access_cost: inside_cost = 1, outside_cost = 0. foo.c:9:21: note: vect_model_load_cost: unaligned supported by hardware. foo.c:9:21: note: vect_get_data_access_cost: inside_cost = 2, outside_cost = 0. foo.c:9:21: note: vect_model_load_cost: unaligned supported by hardware. foo.c:9:21: note: vect_get_data_access_cost: inside_cost = 1, outside_cost = 0. foo.c:9:21: note: vect_model_load_cost: unaligned supported by hardware. foo.c:9:21: note: vect_get_data_access_cost: inside_cost = 2, outside_cost = 0. foo.c:9:21: note: === vect_dissolve_slp_only_groups === foo.c:9:21: note: === vect_analyze_loop_operations === foo.c:9:21: note: examining phi: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: vect_is_simple_use: operand (int) aval_13, type of def: internal foo.c:9:21: note: vect_is_simple_use: vectype vector(4) int foo.c:9:21: note: vect_is_simple_use: operand last_19 = PHI <last_8(7), 108(15)>, type of def: reduction foo.c:9:21: note: vect_is_simple_use: vectype vector(4) int Estimating # of iterations of loop 1 Analyzing # of iterations of loop 1 exit condition [42, + , 4294967295] != 0 bounds on difference of bases: -42 ... -42 result: # of iterations 42, bounded by 42 Analyzing # of iterations of loop 1 exit condition [42, + , 4294967295] != 0 bounds on difference of bases: -42 ... -42 result: # of iterations 42, bounded by 42 Statement (exit)if (ivtmp_10 != 0) is executed at most 42 (bounded by 42) + 1 times in loop 1. Induction variable (short int *) a_12(D) + 2 * iteration does not wrap in statement _3 = a_12(D) + _2; in loop 1. Statement _3 = a_12(D) + _2; is executed at most 9223372036854775806 (bounded by 9223372036854775806) + 1 times in loop 1. Induction variable (int *) b_14(D) + 4 * iteration does not wrap in statement _6 = b_14(D) + _5; in loop 1. Statement _6 = b_14(D) + _5; is executed at most 4611686018427387902 (bounded by 4611686018427387902) + 1 times in loop 1. Induction variable (int) 1 + 1 * iteration does not wrap in statement i_17 = i_21 + 1; in loop 1. Statement i_17 = i_21 + 1; is executed at most 42 (bounded by 42) + 1 times in loop 1. vect_model_reduction_cost: inside_cost = 0, prologue_cost = 4, epilogue_cost = 7 . foo.c:9:21: note: examining phi: i_21 = PHI <i_17(7), 0(15)> foo.c:9:21: note: examining phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)> foo.c:9:21: note: ==> examining statement: _1 = (long unsigned int) i_21; foo.c:9:21: note: irrelevant. foo.c:9:21: note: ==> examining statement: _2 = _1 * 2; foo.c:9:21: note: irrelevant. foo.c:9:21: note: ==> examining statement: _3 = a_12(D) + _2; foo.c:9:21: note: irrelevant. foo.c:9:21: note: ==> examining statement: aval_13 = *_3; foo.c:9:21: missed: can't operate on partial vectors because the target doesn't have the appropriate partial vectorization load or store. foo.c:9:21: note: Vectorizing an unaligned access. foo.c:9:21: note: vect_model_load_cost: unaligned supported by hardware. foo.c:9:21: note: vect_model_load_cost: inside_cost = 1, prologue_cost = 0 . foo.c:9:21: note: ==> examining statement: _5 = _1 * 4; foo.c:9:21: note: irrelevant. foo.c:9:21: note: ==> examining statement: _6 = b_14(D) + _5; foo.c:9:21: note: irrelevant. foo.c:9:21: note: ==> examining statement: _7 = *_6; foo.c:9:21: note: Vectorizing an unaligned access. foo.c:9:21: note: vect_model_load_cost: unaligned supported by hardware. foo.c:9:21: note: vect_model_load_cost: inside_cost = 1, prologue_cost = 0 . foo.c:9:21: note: ==> examining statement: last_16 = (int) aval_13; foo.c:9:21: note: vect_is_simple_use: operand *_3, type of def: internal foo.c:9:21: note: vect_is_simple_use: vectype vector(4) short int foo.c:9:21: note: === vectorizable_conversion === foo.c:9:21: note: vect_model_simple_cost: inside_cost = 1, prologue_cost = 0 . foo.c:9:21: note: ==> examining statement: _9 = _7 < min_v_15(D); foo.c:9:21: note: vect_is_simple_use: operand *_6, type of def: internal foo.c:9:21: note: vect_is_simple_use: vectype vector(4) int foo.c:9:21: note: vect_is_simple_use: operand min_v_15(D), type of def: external foo.c:9:21: note: vect_model_simple_cost: inside_cost = 1, prologue_cost = 1 . foo.c:9:21: note: ==> examining statement: last_8 = _9 ? last_16 : last_19; foo.c:9:21: note: vect_is_simple_use: operand _7 < min_v_15(D), type of def: internal foo.c:9:21: note: vect_is_simple_use: vectype vector(4) <signed-boolean:32> foo.c:9:21: note: vect_is_simple_use: operand (int) aval_13, type of def: internal foo.c:9:21: note: vect_is_simple_use: vectype vector(4) int foo.c:9:21: note: vect_is_simple_use: operand last_19 = PHI <last_8(7), 108(15)>, type of def: reduction foo.c:9:21: note: vect_is_simple_use: vectype vector(4) int foo.c:9:21: note: vect_model_simple_cost: inside_cost = 1, prologue_cost = 0 . foo.c:9:21: note: ==> examining statement: i_17 = i_21 + 1; foo.c:9:21: note: irrelevant. foo.c:9:21: note: ==> examining statement: ivtmp_10 = ivtmp_18 - 1; foo.c:9:21: note: irrelevant. foo.c:9:21: note: ==> examining statement: if (ivtmp_10 != 0) foo.c:9:21: note: irrelevant. _9 ? last_16 : last_19 4 times scalar_to_vec costs 4 in prologue _9 ? last_16 : last_19 2 times vector_stmt costs 2 in epilogue _9 ? last_16 : last_19 2 times vec_to_scalar costs 4 in epilogue _9 ? last_16 : last_19 1 times scalar_to_vec costs 1 in epilogue *_3 1 times unaligned_load (misalign -1) costs 1 in body *_6 1 times unaligned_load (misalign -1) costs 1 in body (int) aval_13 1 times vector_stmt costs 1 in body _7 < min_v_15(D) 1 times scalar_to_vec costs 1 in prologue _7 < min_v_15(D) 1 times vector_stmt costs 1 in body _9 ? last_16 : last_19 1 times vector_stmt costs 1 in body foo.c:9:21: note: operating on full vectors. foo.c:9:21: note: cost model disabled. foo.c:9:21: note: epilog loop required foo.c:9:21: note: vect_can_advance_ivs_p: foo.c:9:21: note: Analyze phi: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: reduc or virtual phi. skip. foo.c:9:21: note: Analyze phi: i_21 = PHI <i_17(7), 0(15)> foo.c:9:21: note: Analyze phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)> foo.c:9:21: note: ***** Analysis succeeded with vector mode V4HI foo.c:9:21: note: ***** Choosing vector mode V4HI foo.c:9:21: note: ***** Re-trying epilogue analysis with vector mode V16QI foo.c:9:21: note: === vect_analyze_data_refs === foo.c:9:21: note: got vectype for stmt: aval_13 = *_3; vector(8) short int foo.c:9:21: note: got vectype for stmt: _7 = *_6; vector(4) int foo.c:9:21: note: === vect_analyze_scalar_cycles === foo.c:9:21: note: Analyze phi: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: Access function of PHI: last_19 foo.c:9:21: note: Analyze phi: i_21 = PHI <i_17(7), 0(15)> foo.c:9:21: note: Access function of PHI: {0, +, 1}_1 foo.c:9:21: note: step: 1, init: 0 foo.c:9:21: note: Detected induction. foo.c:9:21: note: Analyze phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)> foo.c:9:21: note: Access function of PHI: {43, +, 4294967295}_1 foo.c:9:21: note: step: 4294967295, init: 43 foo.c:9:21: note: Detected induction. foo.c:9:21: note: Analyze phi: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: reduction path: last_8 last_19 foo.c:9:21: note: reduction: detected reduction foo.c:9:21: note: Detected reduction. foo.c:9:21: note: === vect_determine_precisions === foo.c:9:21: note: using boolean precision 32 for _9 = _7 < min_v_15(D); foo.c:9:21: note: ivtmp_10 has no range info foo.c:9:21: note: i_17 has range [0x1, 0x2b] foo.c:9:21: note: can narrow to unsigned:6 without loss of precision: i_17 = i_21 + 1; foo.c:9:21: note: last_8 has no range info foo.c:9:21: note: last_16 has no range info foo.c:9:21: note: _7 has no range info foo.c:9:21: note: _5 has range [0x0, 0xa8] foo.c:9:21: note: can narrow to unsigned:8 without loss of precision: _5 = _1 * 4; foo.c:9:21: note: aval_13 has no range info foo.c:9:21: note: _2 has range [0x0, 0x54] foo.c:9:21: note: can narrow to unsigned:7 without loss of precision: _2 = _1 * 2; foo.c:9:21: note: _1 has range [0x0, 0x2a] foo.c:9:21: note: === vect_pattern_recog === foo.c:9:21: note: vect_is_simple_use: operand (long unsigned int) i_21, type of def: internal foo.c:9:21: note: vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, type of def: induction foo.c:9:21: note: vect_is_simple_use: operand (long unsigned int) i_21, type of def: internal foo.c:9:21: note: vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, type of def: induction foo.c:9:21: note: vect_recog_widen_mult_pattern: detected: _2 = _1 * 2; foo.c:9:21: note: widen_mult pattern recognized: patt_45 = (long unsigned int) patt_44; foo.c:9:21: note: extra pattern stmt: patt_44 = i_21 w* 2; foo.c:9:21: note: vect_is_simple_use: operand (long unsigned int) i_21, type of def: internal foo.c:9:21: note: vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, type of def: induction foo.c:9:21: note: vect_is_simple_use: operand (long unsigned int) i_21, type of def: internal foo.c:9:21: note: vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, type of def: induction foo.c:9:21: note: vect_recog_widen_mult_pattern: detected: _5 = _1 * 4; foo.c:9:21: note: widen_mult pattern recognized: patt_47 = (long unsigned int) patt_46; foo.c:9:21: note: extra pattern stmt: patt_46 = i_21 w* 4; foo.c:9:21: note: vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, type of def: induction foo.c:9:21: note: vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, type of def: induction foo.c:9:21: note: vect_is_simple_use: operand ivtmp_18 = PHI <ivtmp_10(7), 43(15)>, type of def: induction foo.c:9:21: note: === vect_analyze_data_ref_accesses === foo.c:9:21: note: === vect_mark_stmts_to_be_vectorized === foo.c:9:21: note: init: phi relevant? last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: init: phi relevant? i_21 = PHI <i_17(7), 0(15)> foo.c:9:21: note: init: phi relevant? ivtmp_18 = PHI <ivtmp_10(7), 43(15)> foo.c:9:21: note: init: stmt relevant? _1 = (long unsigned int) i_21; foo.c:9:21: note: init: stmt relevant? _2 = _1 * 2; foo.c:9:21: note: init: stmt relevant? _3 = a_12(D) + _2; foo.c:9:21: note: init: stmt relevant? aval_13 = *_3; foo.c:9:21: note: init: stmt relevant? _5 = _1 * 4; foo.c:9:21: note: init: stmt relevant? _6 = b_14(D) + _5; foo.c:9:21: note: init: stmt relevant? _7 = *_6; foo.c:9:21: note: init: stmt relevant? last_16 = (int) aval_13; foo.c:9:21: note: init: stmt relevant? _9 = _7 < min_v_15(D); foo.c:9:21: note: init: stmt relevant? last_8 = _9 ? last_16 : last_19; foo.c:9:21: note: vec_stmt_relevant_p: used out of loop. foo.c:9:21: note: vect_is_simple_use: operand _7 < min_v_15(D), type of def: internal foo.c:9:21: note: vec_stmt_relevant_p: stmt live but not relevant. foo.c:9:21: note: mark relevant 1, live 1: last_8 = _9 ? last_16 : last_19; foo.c:9:21: note: init: stmt relevant? i_17 = i_21 + 1; foo.c:9:21: note: init: stmt relevant? ivtmp_10 = ivtmp_18 - 1; foo.c:9:21: note: init: stmt relevant? if (ivtmp_10 != 0) foo.c:9:21: note: worklist: examine stmt: last_8 = _9 ? last_16 : last_19; foo.c:9:21: note: vect_is_simple_use: operand _7 < min_v_15(D), type of def: internal foo.c:9:21: note: mark relevant 1, live 0: _9 = _7 < min_v_15(D); foo.c:9:21: note: vect_is_simple_use: operand (int) aval_13, type of def: internal foo.c:9:21: note: mark relevant 1, live 0: last_16 = (int) aval_13; foo.c:9:21: note: vect_is_simple_use: operand last_19 = PHI <last_8(7), 108(15)>, type of def: reduction foo.c:9:21: note: mark relevant 1, live 0: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: worklist: examine stmt: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: vect_is_simple_use: operand _9 ? last_16 : last_19, type of def: reduction foo.c:9:21: note: reduc-stmt defining reduc-phi in the same nest. foo.c:9:21: note: mark relevant 1, live 1: last_8 = _9 ? last_16 : last_19; foo.c:9:21: note: already marked relevant/live. foo.c:9:21: note: vect_is_simple_use: operand 108, type of def: constant foo.c:9:21: note: worklist: examine stmt: last_16 = (int) aval_13; foo.c:9:21: note: vect_is_simple_use: operand *_3, type of def: internal foo.c:9:21: note: mark relevant 1, live 0: aval_13 = *_3; foo.c:9:21: note: worklist: examine stmt: aval_13 = *_3; foo.c:9:21: note: worklist: examine stmt: _9 = _7 < min_v_15(D); foo.c:9:21: note: vect_is_simple_use: operand *_6, type of def: internal foo.c:9:21: note: mark relevant 1, live 0: _7 = *_6; foo.c:9:21: note: vect_is_simple_use: operand min_v_15(D), type of def: external foo.c:9:21: note: worklist: examine stmt: _7 = *_6; foo.c:9:21: note: === vect_analyze_data_ref_dependences === foo.c:9:21: note: === vect_determine_vectorization_factor === foo.c:9:21: note: ==> examining phi: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: get vectype for scalar type: int foo.c:9:21: note: vectype: vector(4) int foo.c:9:21: note: nunits = 4 foo.c:9:21: note: ==> examining phi: i_21 = PHI <i_17(7), 0(15)> foo.c:9:21: note: ==> examining phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)> foo.c:9:21: note: ==> examining statement: _1 = (long unsigned int) i_21; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: _2 = _1 * 2; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining pattern def stmt: patt_44 = i_21 w* 2; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining pattern statement: patt_45 = (long unsigned int) patt_44; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: _3 = a_12(D) + _2; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: aval_13 = *_3; foo.c:9:21: note: precomputed vectype: vector(8) short int foo.c:9:21: note: nunits = 8 foo.c:9:21: note: ==> examining statement: _5 = _1 * 4; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining pattern def stmt: patt_46 = i_21 w* 4; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining pattern statement: patt_47 = (long unsigned int) patt_46; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: _6 = b_14(D) + _5; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: _7 = *_6; foo.c:9:21: note: precomputed vectype: vector(4) int foo.c:9:21: note: nunits = 4 foo.c:9:21: note: ==> examining statement: last_16 = (int) aval_13; foo.c:9:21: note: get vectype for scalar type: int foo.c:9:21: note: vectype: vector(4) int foo.c:9:21: note: get vectype for smallest scalar type: short int foo.c:9:21: note: nunits vectype: vector(8) short int foo.c:9:21: note: nunits = 8 foo.c:9:21: note: ==> examining statement: _9 = _7 < min_v_15(D); foo.c:9:21: note: vectype: vector(4) <signed-boolean:32> foo.c:9:21: note: nunits = 4 foo.c:9:21: note: ==> examining statement: last_8 = _9 ? last_16 : last_19; foo.c:9:21: note: get vectype for scalar type: int foo.c:9:21: note: vectype: vector(4) int foo.c:9:21: note: nunits = 4 foo.c:9:21: note: ==> examining statement: i_17 = i_21 + 1; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: ivtmp_10 = ivtmp_18 - 1; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: if (ivtmp_10 != 0) foo.c:9:21: note: skip. foo.c:9:21: note: vectorization factor = 8 foo.c:9:21: note: === vect_compute_single_scalar_iteration_cost === *_3 1 times scalar_load costs 1 in prologue *_6 1 times scalar_load costs 1 in prologue (int) aval_13 1 times scalar_stmt costs 1 in prologue _7 < min_v_15(D) 1 times scalar_stmt costs 1 in prologue _9 ? last_16 : last_19 1 times scalar_stmt costs 1 in prologue foo.c:9:21: note: === vect_analyze_slp === foo.c:9:21: note: === vect_make_slp_decision === foo.c:9:21: note: vectorization_factor = 8, niters = 43 foo.c:9:21: note: === vect_analyze_data_refs_alignment === foo.c:9:21: note: recording new base alignment for a_12(D) alignment: 2 misalignment: 0 based on: aval_13 = *_3; foo.c:9:21: note: recording new base alignment for b_14(D) alignment: 4 misalignment: 0 based on: _7 = *_6; foo.c:9:21: note: vect_compute_data_ref_alignment: foo.c:9:21: note: can't force alignment of ref: *_3 foo.c:9:21: note: vect_compute_data_ref_alignment: foo.c:9:21: note: can't force alignment of ref: *_6 foo.c:9:21: note: === vect_prune_runtime_alias_test_list === foo.c:9:21: note: === vect_dissolve_slp_only_groups === foo.c:9:21: note: === vect_analyze_loop_operations === foo.c:9:21: note: examining phi: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: vect_is_simple_use: operand (int) aval_13, type of def: internal foo.c:9:21: note: vect_is_simple_use: vectype vector(4) int foo.c:9:21: note: vect_is_simple_use: operand last_19 = PHI <last_8(7), 108(15)>, type of def: reduction foo.c:9:21: note: vect_is_simple_use: vectype vector(4) int foo.c:9:21: missed: multiple types in double reduction or condition reduction or fold-left reduction. foo.c:4:1: missed: not vectorized: relevant phi not supported: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: missed: bad operation or unsupported loop bound. foo.c:9:21: note: ***** Analysis failed with vector mode V16QI foo.c:9:21: note: ***** The result for vector mode V8QI would be the same foo.c:9:21: note: ***** Re-trying epilogue analysis with vector mode V2SI foo.c:9:21: note: === vect_analyze_data_refs === foo.c:9:21: note: got vectype for stmt: aval_13 = *_3; vector(4) short int foo.c:9:21: note: got vectype for stmt: _7 = *_6; vector(2) int foo.c:9:21: note: === vect_analyze_scalar_cycles === foo.c:9:21: note: Analyze phi: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: Access function of PHI: last_19 foo.c:9:21: note: Analyze phi: i_21 = PHI <i_17(7), 0(15)> foo.c:9:21: note: Access function of PHI: {0, +, 1}_1 foo.c:9:21: note: step: 1, init: 0 foo.c:9:21: note: Detected induction. foo.c:9:21: note: Analyze phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)> foo.c:9:21: note: Access function of PHI: {43, +, 4294967295}_1 foo.c:9:21: note: step: 4294967295, init: 43 foo.c:9:21: note: Detected induction. foo.c:9:21: note: Analyze phi: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: reduction path: last_8 last_19 foo.c:9:21: note: reduction: detected reduction foo.c:9:21: note: Detected reduction. foo.c:9:21: note: === vect_determine_precisions === foo.c:9:21: note: using boolean precision 32 for _9 = _7 < min_v_15(D); foo.c:9:21: note: ivtmp_10 has no range info foo.c:9:21: note: i_17 has range [0x1, 0x2b] foo.c:9:21: note: can narrow to unsigned:6 without loss of precision: i_17 = i_21 + 1; foo.c:9:21: note: last_8 has no range info foo.c:9:21: note: last_16 has no range info foo.c:9:21: note: _7 has no range info foo.c:9:21: note: _5 has range [0x0, 0xa8] foo.c:9:21: note: can narrow to unsigned:8 without loss of precision: _5 = _1 * 4; foo.c:9:21: note: aval_13 has no range info foo.c:9:21: note: _2 has range [0x0, 0x54] foo.c:9:21: note: can narrow to unsigned:7 without loss of precision: _2 = _1 * 2; foo.c:9:21: note: _1 has range [0x0, 0x2a] foo.c:9:21: note: === vect_pattern_recog === foo.c:9:21: note: vect_is_simple_use: operand (long unsigned int) i_21, type of def: internal foo.c:9:21: note: vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, type of def: induction foo.c:9:21: note: vect_is_simple_use: operand (long unsigned int) i_21, type of def: internal foo.c:9:21: note: vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, type of def: induction foo.c:9:21: note: vect_recog_widen_mult_pattern: detected: _2 = _1 * 2; foo.c:9:21: note: vect_recog_mult_pattern: detected: _2 = _1 * 2; foo.c:9:21: note: mult pattern recognized: patt_48 = _1 << 1; foo.c:9:21: note: vect_is_simple_use: operand (long unsigned int) i_21, type of def: internal foo.c:9:21: note: vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, type of def: induction foo.c:9:21: note: vect_is_simple_use: operand (long unsigned int) i_21, type of def: internal foo.c:9:21: note: vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, type of def: induction foo.c:9:21: note: vect_recog_widen_mult_pattern: detected: _5 = _1 * 4; foo.c:9:21: note: vect_recog_mult_pattern: detected: _5 = _1 * 4; foo.c:9:21: note: mult pattern recognized: patt_49 = _1 << 2; foo.c:9:21: note: vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, type of def: induction foo.c:9:21: note: vect_is_simple_use: operand i_21 = PHI <i_17(7), 0(15)>, type of def: induction foo.c:9:21: note: vect_is_simple_use: operand ivtmp_18 = PHI <ivtmp_10(7), 43(15)>, type of def: induction foo.c:9:21: note: === vect_analyze_data_ref_accesses === foo.c:9:21: note: === vect_mark_stmts_to_be_vectorized === foo.c:9:21: note: init: phi relevant? last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: init: phi relevant? i_21 = PHI <i_17(7), 0(15)> foo.c:9:21: note: init: phi relevant? ivtmp_18 = PHI <ivtmp_10(7), 43(15)> foo.c:9:21: note: init: stmt relevant? _1 = (long unsigned int) i_21; foo.c:9:21: note: init: stmt relevant? _2 = _1 * 2; foo.c:9:21: note: init: stmt relevant? _3 = a_12(D) + _2; foo.c:9:21: note: init: stmt relevant? aval_13 = *_3; foo.c:9:21: note: init: stmt relevant? _5 = _1 * 4; foo.c:9:21: note: init: stmt relevant? _6 = b_14(D) + _5; foo.c:9:21: note: init: stmt relevant? _7 = *_6; foo.c:9:21: note: init: stmt relevant? last_16 = (int) aval_13; foo.c:9:21: note: init: stmt relevant? _9 = _7 < min_v_15(D); foo.c:9:21: note: init: stmt relevant? last_8 = _9 ? last_16 : last_19; foo.c:9:21: note: vec_stmt_relevant_p: used out of loop. foo.c:9:21: note: vect_is_simple_use: operand _7 < min_v_15(D), type of def: internal foo.c:9:21: note: vec_stmt_relevant_p: stmt live but not relevant. foo.c:9:21: note: mark relevant 1, live 1: last_8 = _9 ? last_16 : last_19; foo.c:9:21: note: init: stmt relevant? i_17 = i_21 + 1; foo.c:9:21: note: init: stmt relevant? ivtmp_10 = ivtmp_18 - 1; foo.c:9:21: note: init: stmt relevant? if (ivtmp_10 != 0) foo.c:9:21: note: worklist: examine stmt: last_8 = _9 ? last_16 : last_19; foo.c:9:21: note: vect_is_simple_use: operand _7 < min_v_15(D), type of def: internal foo.c:9:21: note: mark relevant 1, live 0: _9 = _7 < min_v_15(D); foo.c:9:21: note: vect_is_simple_use: operand (int) aval_13, type of def: internal foo.c:9:21: note: mark relevant 1, live 0: last_16 = (int) aval_13; foo.c:9:21: note: vect_is_simple_use: operand last_19 = PHI <last_8(7), 108(15)>, type of def: reduction foo.c:9:21: note: mark relevant 1, live 0: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: worklist: examine stmt: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: vect_is_simple_use: operand _9 ? last_16 : last_19, type of def: reduction foo.c:9:21: note: reduc-stmt defining reduc-phi in the same nest. foo.c:9:21: note: mark relevant 1, live 1: last_8 = _9 ? last_16 : last_19; foo.c:9:21: note: already marked relevant/live. foo.c:9:21: note: vect_is_simple_use: operand 108, type of def: constant foo.c:9:21: note: worklist: examine stmt: last_16 = (int) aval_13; foo.c:9:21: note: vect_is_simple_use: operand *_3, type of def: internal foo.c:9:21: note: mark relevant 1, live 0: aval_13 = *_3; foo.c:9:21: note: worklist: examine stmt: aval_13 = *_3; foo.c:9:21: note: worklist: examine stmt: _9 = _7 < min_v_15(D); foo.c:9:21: note: vect_is_simple_use: operand *_6, type of def: internal foo.c:9:21: note: mark relevant 1, live 0: _7 = *_6; foo.c:9:21: note: vect_is_simple_use: operand min_v_15(D), type of def: external foo.c:9:21: note: worklist: examine stmt: _7 = *_6; foo.c:9:21: note: === vect_analyze_data_ref_dependences === foo.c:9:21: note: === vect_determine_vectorization_factor === foo.c:9:21: note: ==> examining phi: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: get vectype for scalar type: int foo.c:9:21: note: vectype: vector(2) int foo.c:9:21: note: nunits = 2 foo.c:9:21: note: ==> examining phi: i_21 = PHI <i_17(7), 0(15)> foo.c:9:21: note: ==> examining phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)> foo.c:9:21: note: ==> examining statement: _1 = (long unsigned int) i_21; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: _2 = _1 * 2; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining pattern statement: patt_48 = _1 << 1; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: _3 = a_12(D) + _2; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: aval_13 = *_3; foo.c:9:21: note: precomputed vectype: vector(4) short int foo.c:9:21: note: nunits = 4 foo.c:9:21: note: ==> examining statement: _5 = _1 * 4; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining pattern statement: patt_49 = _1 << 2; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: _6 = b_14(D) + _5; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: _7 = *_6; foo.c:9:21: note: precomputed vectype: vector(2) int foo.c:9:21: note: nunits = 2 foo.c:9:21: note: ==> examining statement: last_16 = (int) aval_13; foo.c:9:21: note: get vectype for scalar type: int foo.c:9:21: note: vectype: vector(2) int foo.c:9:21: note: get vectype for smallest scalar type: short int foo.c:9:21: note: nunits vectype: vector(4) short int foo.c:9:21: note: nunits = 4 foo.c:9:21: note: ==> examining statement: _9 = _7 < min_v_15(D); foo.c:9:21: note: vectype: vector(2) <signed-boolean:32> foo.c:9:21: note: nunits = 2 foo.c:9:21: note: ==> examining statement: last_8 = _9 ? last_16 : last_19; foo.c:9:21: note: get vectype for scalar type: int foo.c:9:21: note: vectype: vector(2) int foo.c:9:21: note: nunits = 2 foo.c:9:21: note: ==> examining statement: i_17 = i_21 + 1; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: ivtmp_10 = ivtmp_18 - 1; foo.c:9:21: note: skip. foo.c:9:21: note: ==> examining statement: if (ivtmp_10 != 0) foo.c:9:21: note: skip. foo.c:9:21: note: vectorization factor = 4 foo.c:9:21: note: === vect_compute_single_scalar_iteration_cost === *_3 1 times scalar_load costs 1 in prologue *_6 1 times scalar_load costs 1 in prologue (int) aval_13 1 times scalar_stmt costs 1 in prologue _7 < min_v_15(D) 1 times scalar_stmt costs 1 in prologue _9 ? last_16 : last_19 1 times scalar_stmt costs 1 in prologue foo.c:9:21: note: === vect_analyze_slp === foo.c:9:21: note: === vect_make_slp_decision === foo.c:9:21: note: vectorization_factor = 4, niters = 43 foo.c:9:21: note: === vect_analyze_data_refs_alignment === foo.c:9:21: note: recording new base alignment for a_12(D) alignment: 2 misalignment: 0 based on: aval_13 = *_3; foo.c:9:21: note: recording new base alignment for b_14(D) alignment: 4 misalignment: 0 based on: _7 = *_6; foo.c:9:21: note: vect_compute_data_ref_alignment: foo.c:9:21: note: can't force alignment of ref: *_3 foo.c:9:21: note: vect_compute_data_ref_alignment: foo.c:9:21: note: can't force alignment of ref: *_6 foo.c:9:21: note: === vect_prune_runtime_alias_test_list === foo.c:9:21: note: === vect_dissolve_slp_only_groups === foo.c:9:21: note: === vect_analyze_loop_operations === foo.c:9:21: note: examining phi: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: vect_is_simple_use: operand (int) aval_13, type of def: internal foo.c:9:21: note: vect_is_simple_use: vectype vector(2) int foo.c:9:21: note: vect_is_simple_use: operand last_19 = PHI <last_8(7), 108(15)>, type of def: reduction foo.c:9:21: note: vect_is_simple_use: vectype vector(2) int foo.c:9:21: missed: multiple types in double reduction or condition reduction or fold-left reduction. foo.c:4:1: missed: not vectorized: relevant phi not supported: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: missed: bad operation or unsupported loop bound. foo.c:9:21: note: ***** Analysis failed with vector mode V2SI foo.c:9:21: optimized: loop vectorized using 8 byte vectors foo.c:9:21: note: === vec_transform_loop === split exit edge split exit edge of scalar loop Removing basic block 19 ;; basic block 19, loop depth 0 ;; pred: 16 ;; succ: foo.c:9:21: note: vect_can_advance_ivs_p: foo.c:9:21: note: Analyze phi: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: reduc or virtual phi. skip. foo.c:9:21: note: Analyze phi: i_21 = PHI <i_17(7), 0(15)> foo.c:9:21: note: Analyze phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)> foo.c:9:21: note: vect_update_ivs_after_vectorizer: phi: last_19 = PHI <last_8(7), 108(15)> foo.c:9:21: note: reduc or virtual phi. skip. foo.c:9:21: note: vect_update_ivs_after_vectorizer: phi: i_21 = PHI <i_17(7), 0(15)> foo.c:9:21: note: vect_update_ivs_after_vectorizer: phi: ivtmp_18 = PHI <ivtmp_10(7), 43(15)> ;; Guessed iterations of loop 3 is 42.052870. New upper bound 2. ;; Scaling loop 3 with scale 7.0% (guessed) to reach upper bound 2 foo.c:9:21: note: ------>vectorizing phi: last_19 = PHI <last_8(7), 108(25)> foo.c:9:21: note: transform phi. foo.c:9:21: note: ------>vectorizing phi: i_21 = PHI <i_17(7), 0(25)> foo.c:9:21: note: ------>vectorizing phi: ivtmp_18 = PHI <ivtmp_10(7), 43(25)> foo.c:9:21: note: ------>vectorizing phi: vect_last_19.7_67 = PHI <(7), { 108, 108, 108, 108 }(25)> foo.c:9:21: note: ------>vectorizing statement: _1 = (long unsigned int) i_21; foo.c:9:21: note: ------>vectorizing statement: patt_40 = i_21 w* 2; foo.c:9:21: note: ------>vectorizing statement: patt_41 = (long unsigned int) patt_40; foo.c:9:21: note: ------>vectorizing statement: _3 = a_12(D) + _2; foo.c:9:21: note: ------>vectorizing statement: aval_13 = *_3; foo.c:9:21: note: transform statement. foo.c:9:21: note: transform load. ncopies = 1 foo.c:9:21: note: create vector_type-pointer variable to type: vector(4) short int vectorizing a pointer ref: *a_12(D) foo.c:9:21: note: created a_12(D) foo.c:9:21: note: add new stmt: vect_aval_13.10_70 = MEM <vector(4) short int> [(short int *)vectp_a.8_68]; foo.c:9:21: note: ------>vectorizing statement: patt_42 = i_21 w* 4; foo.c:9:21: note: ------>vectorizing statement: patt_43 = (long unsigned int) patt_42; foo.c:9:21: note: ------>vectorizing statement: _6 = b_14(D) + _5; foo.c:9:21: note: ------>vectorizing statement: _7 = *_6; foo.c:9:21: note: transform statement. foo.c:9:21: note: transform load. ncopies = 1 foo.c:9:21: note: create vector_type-pointer variable to type: vector(4) int vectorizing a pointer ref: *b_14(D) foo.c:9:21: note: created b_14(D) foo.c:9:21: note: add new stmt: vect__7.13_73 = MEM <vector(4) int> [(int *)vectp_b.11_71]; foo.c:9:21: note: ------>vectorizing statement: last_16 = (int) aval_13; foo.c:9:21: note: transform statement. foo.c:9:21: note: vect_is_simple_use: operand *_3, type of def: internal foo.c:9:21: note: vect_is_simple_use: vectype vector(4) short int foo.c:9:21: note: transform conversion. ncopies = 1. foo.c:9:21: note: vect_get_vec_defs_for_operand: aval_13 foo.c:9:21: note: vect_is_simple_use: operand *_3, type of def: internal foo.c:9:21: note: def_stmt = aval_13 = *_3; foo.c:9:21: note: add new stmt: vect_last_16.14_74 = (vector(4) int) vect_aval_13.10_70; foo.c:9:21: note: ------>vectorizing statement: _9 = _7 < min_v_15(D); foo.c:9:21: note: transform statement. foo.c:9:21: note: vect_is_simple_use: operand *_6, type of def: internal foo.c:9:21: note: vect_is_simple_use: vectype vector(4) int foo.c:9:21: note: vect_is_simple_use: operand min_v_15(D), type of def: external foo.c:9:21: note: vect_get_vec_defs_for_operand: _7 foo.c:9:21: note: vect_is_simple_use: operand *_6, type of def: internal foo.c:9:21: note: def_stmt = _7 = *_6; foo.c:9:21: note: vect_get_vec_defs_for_operand: min_v_15(D) foo.c:9:21: note: vect_is_simple_use: operand min_v_15(D), type of def: external foo.c:9:21: note: created new init_stmt: vect_cst__75 = {min_v_15(D), min_v_15(D), min_v_15(D), min_v_15(D)}; foo.c:9:21: note: add new stmt: mask__9.15_76 = vect__7.13_73 < vect_cst__75; foo.c:9:21: note: ------>vectorizing statement: last_8 = _9 ? last_16 : last_19; foo.c:9:21: note: transform statement. foo.c:9:21: note: vect_is_simple_use: operand _7 < min_v_15(D), type of def: internal foo.c:9:21: note: vect_is_simple_use: vectype vector(4) <signed-boolean:32> foo.c:9:21: note: vect_is_simple_use: operand (int) aval_13, type of def: internal foo.c:9:21: note: vect_is_simple_use: vectype vector(4) int foo.c:9:21: note: vect_is_simple_use: operand last_19 = PHI <last_8(7), 108(25)>, type of def: reduction foo.c:9:21: note: vect_is_simple_use: vectype vector(4) int foo.c:9:21: note: vect_get_vec_defs_for_operand: _9 foo.c:9:21: note: vect_is_simple_use: operand _7 < min_v_15(D), type of def: internal foo.c:9:21: note: def_stmt = _9 = _7 < min_v_15(D); foo.c:9:21: note: vect_get_vec_defs_for_operand: last_16 foo.c:9:21: note: vect_is_simple_use: operand (int) aval_13, type of def: internal foo.c:9:21: note: def_stmt = last_16 = (int) aval_13; foo.c:9:21: note: vect_get_vec_defs_for_operand: last_19 foo.c:9:21: note: vect_is_simple_use: operand last_19 = PHI <last_8(7), 108(25)>, type of def: reduction foo.c:9:21: note: def_stmt = last_19 = PHI <last_8(7), 108(25)> foo.c:9:21: note: add new stmt: vect_last_8.16_77 = VEC_COND_EXPR <mask__9.15_76, vect_last_16.14_74, vect_last_19.7_67>; foo.c:9:21: note: ------>vectorizing statement: i_17 = i_21 + 1; foo.c:9:21: note: ------>vectorizing statement: ivtmp_10 = ivtmp_18 - 1; foo.c:9:21: note: ------>vectorizing statement: if (ivtmp_10 != 0) foo.c:9:21: note: New loop exit condition: if (ivtmp_91 < 10) ;; Scaling loop 1 with scale 25.0% (adjusted) ;; Guessed iterations of loop 1 is 9.763217. New upper bound 9. ;; Scaling loop 1 with scale 92.9% (guessed) to reach upper bound 9 foo.c:9:21: note: LOOP VECTORIZED foo.c:4:1: note: vectorized 1 loops in function. ;; Created LCSSA PHI: _92 = PHI <_81(3)> Updating SSA: Registering new PHI nodes in block #3 Updating SSA information for statement _81 = VEC_COND_EXPR <mask__9.15_76, ivtmp_78, _80>; Registering new PHI nodes in block #7 Registering new PHI nodes in block #20 Updating SSA information for statement _83 = .REDUC_MAX (_81); Updating SSA information for statement _85 = _81 == _84; Registering new PHI nodes in block #21 SSA replacement table N_i -> { O_1 ... O_j } means that N_i replaces O_1, ..., O_j _92 -> { _81 } Incremental SSA update started at block: 3 Number of blocks in CFG: 26 Number of blocks to update: 3 ( 12%) Affected blocks: 3 7 20 Processing block 0: BB25 Value numbering stmt = vect_cst__75 = {min_v_15(D), min_v_15(D), min_v_15(D), min_v_15(D)}; Setting value number of vect_cst__75 to vect_cst__75 (changed) marking outgoing edge 25 -> 3 executable Making available beyond BB25 vect_cst__75 for value vect_cst__75 Processing block 1: BB3 Cannot trust state of predecessor edge 7 -> 3, marking executable Value numbering stmt = last_19 = PHI <last_8(7), 108(25)> Setting value number of last_19 to last_19 (changed) Making available beyond BB3 last_19 for value last_19 Value numbering stmt = i_21 = PHI <i_17(7), 0(25)> Setting value number of i_21 to i_21 (changed) Making available beyond BB3 i_21 for value i_21 Value numbering stmt = ivtmp_18 = PHI <ivtmp_10(7), 43(25)> Setting value number of ivtmp_18 to ivtmp_18 (changed) Making available beyond BB3 ivtmp_18 for value ivtmp_18 Value numbering stmt = vect_last_19.7_67 = PHI <vect_last_8.16_77(7), { 108, 108, 108, 108 }(25)> Setting value number of vect_last_19.7_67 to vect_last_19.7_67 (changed) Making available beyond BB3 vect_last_19.7_67 for value vect_last_19.7_67 Value numbering stmt = vectp_a.8_68 = PHI <vectp_a.8_69(7), a_12(D)(25)> Setting value number of vectp_a.8_68 to vectp_a.8_68 (changed) Making available beyond BB3 vectp_a.8_68 for value vectp_a.8_68 Value numbering stmt = vectp_b.11_71 = PHI <vectp_b.11_72(7), b_14(D)(25)> Setting value number of vectp_b.11_71 to vectp_b.11_71 (changed) Making available beyond BB3 vectp_b.11_71 for value vectp_b.11_71 Value numbering stmt = ivtmp_78 = PHI <ivtmp_79(7), { 1, 2, 3, 4 }(25)> Setting value number of ivtmp_78 to ivtmp_78 (changed) Making available beyond BB3 ivtmp_78 for value ivtmp_78 Value numbering stmt = _80 = PHI <_81(7), { 0, 0, 0, 0 }(25)> Setting value number of _80 to _80 (changed) Making available beyond BB3 _80 for value _80 Value numbering stmt = ivtmp_90 = PHI <ivtmp_91(7), 0(25)> Setting value number of ivtmp_90 to ivtmp_90 (changed) Making available beyond BB3 ivtmp_90 for value ivtmp_90 Value numbering stmt = _1 = (long unsigned int) i_21; Setting value number of _1 to _1 (changed) Making available beyond BB3 _1 for value _1 Value numbering stmt = _2 = _1 * 2; Setting value number of _2 to _2 (changed) Making available beyond BB3 _2 for value _2 Value numbering stmt = _3 = a_12(D) + _2; Setting value number of _3 to _3 (changed) Making available beyond BB3 _3 for value _3 Value numbering stmt = vect_aval_13.10_70 = MEM <vector(4) short int> [(short int *)vectp_a.8_68]; Setting value number of vect_aval_13.10_70 to vect_aval_13.10_70 (changed) Making available beyond BB3 vect_aval_13.10_70 for value vect_aval_13.10_70 Value numbering stmt = aval_13 = *_3; Setting value number of aval_13 to aval_13 (changed) Making available beyond BB3 aval_13 for value aval_13 Value numbering stmt = _5 = _1 * 4; Setting value number of _5 to _5 (changed) Making available beyond BB3 _5 for value _5 Value numbering stmt = _6 = b_14(D) + _5; Setting value number of _6 to _6 (changed) Making available beyond BB3 _6 for value _6 Value numbering stmt = vect__7.13_73 = MEM <vector(4) int> [(int *)vectp_b.11_71]; Setting value number of vect__7.13_73 to vect__7.13_73 (changed) Making available beyond BB3 vect__7.13_73 for value vect__7.13_73 Value numbering stmt = _7 = *_6; Setting value number of _7 to _7 (changed) Making available beyond BB3 _7 for value _7 Value numbering stmt = vect_last_16.14_74 = (vector(4) int) vect_aval_13.10_70; Setting value number of vect_last_16.14_74 to vect_last_16.14_74 (changed) Making available beyond BB3 vect_last_16.14_74 for value vect_last_16.14_74 Value numbering stmt = last_16 = (int) aval_13; Setting value number of last_16 to last_16 (changed) Making available beyond BB3 last_16 for value last_16 Value numbering stmt = mask__9.15_76 = vect__7.13_73 < vect_cst__75; Setting value number of mask__9.15_76 to mask__9.15_76 (changed) Making available beyond BB3 mask__9.15_76 for value mask__9.15_76 Value numbering stmt = _9 = _7 < min_v_15(D); Setting value number of _9 to _9 (changed) Making available beyond BB3 _9 for value _9 Value numbering stmt = vect_last_8.16_77 = VEC_COND_EXPR <mask__9.15_76, vect_last_16.14_74, vect_last_19.7_67>; Setting value number of vect_last_8.16_77 to vect_last_8.16_77 (changed) Making available beyond BB3 vect_last_8.16_77 for value vect_last_8.16_77 Value numbering stmt = last_8 = _9 ? last_16 : last_19; Setting value number of last_8 to last_8 (changed) Making available beyond BB3 last_8 for value last_8 Value numbering stmt = i_17 = i_21 + 1; Setting value number of i_17 to i_17 (changed) Making available beyond BB3 i_17 for value i_17 Value numbering stmt = ivtmp_10 = ivtmp_18 - 1; Setting value number of ivtmp_10 to ivtmp_10 (changed) Making available beyond BB3 ivtmp_10 for value ivtmp_10 Value numbering stmt = vectp_a.8_69 = vectp_a.8_68 + 8; Setting value number of vectp_a.8_69 to vectp_a.8_69 (changed) Making available beyond BB3 vectp_a.8_69 for value vectp_a.8_69 Value numbering stmt = vectp_b.11_72 = vectp_b.11_71 + 16; Setting value number of vectp_b.11_72 to vectp_b.11_72 (changed) Making available beyond BB3 vectp_b.11_72 for value vectp_b.11_72 Value numbering stmt = _81 = VEC_COND_EXPR <mask__9.15_76, ivtmp_78, _80>; Setting value number of _81 to _81 (changed) Making available beyond BB3 _81 for value _81 Value numbering stmt = ivtmp_79 = ivtmp_78 + { 4, 4, 4, 4 }; Setting value number of ivtmp_79 to ivtmp_79 (changed) Making available beyond BB3 ivtmp_79 for value ivtmp_79 Value numbering stmt = ivtmp_91 = ivtmp_90 + 1; Setting value number of ivtmp_91 to ivtmp_91 (changed) Making available beyond BB3 ivtmp_91 for value ivtmp_91 Value numbering stmt = if (ivtmp_91 < 10) Recording on edge 3->7 ivtmp_91 lt_expr 10 == true Recording on edge 3->7 ivtmp_91 ge_expr 10 == false Recording on edge 3->7 ivtmp_91 ne_expr 10 == true Recording on edge 3->7 ivtmp_91 le_expr 10 == true Recording on edge 3->7 ivtmp_91 gt_expr 10 == false Recording on edge 3->7 ivtmp_91 eq_expr 10 == false marking outgoing edge 3 -> 7 executable marking destination block 20 reachable Processing block 2: BB7 RPO iteration over 3 blocks visited 3 blocks in total discovering 3 executable blocks iterating 1.0 times, a block was visited max. 1 times RPO tracked 35 values available at 32 locations and 35 lattice elements Removing basic block 9 ;; basic block 9, loop depth 1 ;; pred: 16 ;; 13 # last_23 = PHI <108(16), last_34(13)> # i_24 = PHI <0(16), i_35(13)> # ivtmp_25 = PHI <43(16), ivtmp_36(13)> _26 = (long unsigned int) i_24; _27 = _26 * 2; _28 = a_12(D) + _27; aval_29 = *_28; _30 = _26 * 4; _31 = b_14(D) + _30; _32 = *_31; if (_32 < min_v_15(D)) goto <bb 11>; [50.00%] else goto <bb 12>; [50.00%] ;; succ: 11 ;; 12 Removing basic block 11 ;; basic block 11, loop depth 1 ;; pred: last_33 = (int) _29; ;; succ: 12 Removing basic block 12 ;; basic block 12, loop depth 1 ;; pred: # last_34 = PHI <> i_35 = _24 + 1; ivtmp_36 = _25 - 1; if (ivtmp_36 != 0) goto <bb 13>; [97.68%] else goto <bb 18>; [2.32%] ;; succ: 13 ;; 18 Removing basic block 13 ;; basic block 13, loop depth 1 ;; pred: ;; succ: Removing basic block 16 ;; basic block 16, loop depth 0 ;; pred: ;; succ: Removing basic block 18 ;; basic block 18, loop depth 0 ;; pred: # last_51 = PHI <> goto <bb 6>; [100.00%] ;; succ: 6 Merging blocks 2 and 15 Merging blocks 17 and 6 Merging blocks 2 and 25 fix_loop_structure: fixing up loops for function fix_loop_structure: removing loop 2 __attribute__((noipa, noinline, noclone, no_icf)) int condition_reduction (short int * a, int min_v, int * b) { int stmp_last_8.17; vector(4) int vect_last_8.16; vector(4) <signed-boolean:32> mask__9.15; vector(4) int vect_last_16.14; vector(4) int vect__7.13; int * vectp_b.12; vector(4) int * vectp_b.11; vector(4) short int vect_aval_13.10; short int * vectp_a.9; vector(4) short int * vectp_a.8; vector(4) int vect_last_19.7; unsigned int tmp.6; int tmp.5; int i; short int aval; int last; long unsigned int _1; long unsigned int _2; short int * _3; long unsigned int _5; int * _6; int _7; _Bool _9; unsigned int ivtmp_10; unsigned int ivtmp_18; _Bool _22; unsigned int ivtmp_54; long unsigned int _55; long unsigned int _56; short int * _57; long unsigned int _59; int * _60; int _61; unsigned int ivtmp_64; vector(4) int vect_cst__75; vector(4) unsigned int ivtmp_78; vector(4) unsigned int ivtmp_79; vector(4) unsigned int _80; vector(4) unsigned int _81; unsigned int _83; vector(4) unsigned int _84; vector(4) <signed-boolean:32> _85; vector(4) int _86; vector(4) unsigned int _87; unsigned int _88; int _89; unsigned int ivtmp_90; unsigned int ivtmp_91; vector(4) unsigned int _92; <bb 2> [local count: 24373936]: _22 = 1; vect_cst__75 = {min_v_15(D), min_v_15(D), min_v_15(D), min_v_15(D)}; <bb 3> [local count: 243739360]: # last_19 = PHI <last_8(7), 108(2)> # i_21 = PHI <i_17(7), 0(2)> # ivtmp_18 = PHI <ivtmp_10(7), 43(2)> # vect_last_19.7_67 = PHI <vect_last_8.16_77(7), { 108, 108, 108, 108 }(2)> # vectp_a.8_68 = PHI <vectp_a.8_69(7), a_12(D)(2)> # vectp_b.11_71 = PHI <vectp_b.11_72(7), b_14(D)(2)> # ivtmp_78 = PHI <ivtmp_79(7), { 1, 2, 3, 4 }(2)> # _80 = PHI <_81(7), { 0, 0, 0, 0 }(2)> # ivtmp_90 = PHI <ivtmp_91(7), 0(2)> _1 = (long unsigned int) i_21; _2 = _1 * 2; _3 = a_12(D) + _2; vect_aval_13.10_70 = MEM <vector(4) short int> [(short int *)vectp_a.8_68]; aval_13 = *_3; _5 = _1 * 4; _6 = b_14(D) + _5; vect__7.13_73 = MEM <vector(4) int> [(int *)vectp_b.11_71]; _7 = *_6; vect_last_16.14_74 = (vector(4) int) vect_aval_13.10_70; last_16 = (int) aval_13; mask__9.15_76 = vect__7.13_73 < vect_cst__75; _9 = _7 < min_v_15(D); vect_last_8.16_77 = VEC_COND_EXPR <mask__9.15_76, vect_last_16.14_74, vect_last_19.7_67>; last_8 = _9 ? last_16 : last_19; i_17 = i_21 + 1; ivtmp_10 = ivtmp_18 - 1; vectp_a.8_69 = vectp_a.8_68 + 8; vectp_b.11_72 = vectp_b.11_71 + 16; _81 = VEC_COND_EXPR <mask__9.15_76, ivtmp_78, _80>; ivtmp_79 = ivtmp_78 + { 4, 4, 4, 4 }; ivtmp_91 = ivtmp_90 + 1; if (ivtmp_91 < 10) goto <bb 7>; [90.00%] else goto <bb 20>; [10.00%] <bb 7> [local count: 219365424]: goto <bb 3>; [100.00%] <bb 20> [local count: 24373936]: # last_66 = PHI <last_8(3)> # vect_last_8.16_82 = PHI <vect_last_8.16_77(3)> # _92 = PHI <_81(3)> _83 = .REDUC_MAX (_92); _84 = {_83, _83, _83, _83}; _85 = _92 == _84; _86 = VEC_COND_EXPR <_85, vect_last_8.16_82, { 0, 0, 0, 0 }>; _87 = VIEW_CONVERT_EXPR<vector(4) unsigned int>(_86); _88 = .REDUC_MAX (_87); _89 = (int) _88; <bb 21> [local count: 73121805]: # last_52 = PHI <_89(20), last_62(22)> # i_53 = PHI <40(20), i_63(22)> # ivtmp_54 = PHI <3(20), ivtmp_64(22)> _55 = (long unsigned int) i_53; _56 = _55 * 2; _57 = a_12(D) + _56; aval_58 = *_57; _59 = _55 * 4; _60 = b_14(D) + _59; _61 = *_60; if (_61 < min_v_15(D)) goto <bb 24>; [50.00%] else goto <bb 23>; [50.00%] <bb 22> [local count: 48747874]: goto <bb 21>; [100.00%] <bb 23> [local count: 73121805]: # last_62 = PHI <last_52(21), last_65(24)> i_63 = i_53 + 1; ivtmp_64 = ivtmp_54 - 1; if (ivtmp_64 != 0) goto <bb 22>; [66.67%] else goto <bb 17>; [33.33%] <bb 24> [local count: 36560903]: last_65 = (int) aval_58; goto <bb 23>; [100.00%] <bb 17> [local count: 24373936]: # last_50 = PHI <last_62(23)> return last_50; }