Hi Richard,

on 2020/5/29 下午4:32, Richard Sandiford wrote:
> "Kewen.Lin" <li...@linux.ibm.com> writes:
>> on 2020/5/27 下午6:02, Richard Sandiford wrote:
>>> "Kewen.Lin" <li...@linux.ibm.com> writes:
>>>> Hi Richard,
>>>>

Snip ...

>>
>> Thanks a lot for your detailed explanation!  This proposal looks good
>> based on the current implementation of both masking and length.  I may
>> think too much, but I had a bit concern as below when some targets have
>> both masking and length supports in future, such as ppc adds masking
>> support like SVE.
>>
>> I assumed that you meant each vectorizable_* routine should record the
>> objs for any available partial vectorisation approaches.  If one target
>> supports both, we would have both recorded but decide not to do partial
>> vectorisation finally since both have records.  The target can disable
>> length like through optab to resolve it, but there is one possibility
>> that the masking support can be imperfect initially since ISA support
>> could be gradual, it further leads some vectorizable_* check or final
>> verification to fail for masking, and length approach may work here but
>> it gets disabled.  We can miss to use partial vectorisation here.
>>
>> The other assumption is that each vectorizable_* routine record the 
>> first available partial vectorisation approach, let's assume masking
>> takes preference, then it's fine to record just one here even if one
>> target supports both approaches, but we still have the possiblity to
>> miss the partial vectorisation chance as some check/verify fail with
>> masking but fine with length.
>>
>> Does this concern make sense?
> 
> There's nothing to stop us using masks and lengths in the same loop
> in future if we need to.  It would “just” be a case of setting up both
> the masks and the lengths in vect_set_loop_condition.  But the point is
> that doing that would be extra code, and there's no point writing that
> extra code until it's needed.
> 
> If some future arch does support both mask-based and length-based
> approaches, I think that's even less reason to make a binary choice
> between them.  How we prioritise the length and mask approaches when
> both are available is something that we'll have to decide at the time.
> 
> If your concern is that the arch might support masked operations
> without wanting them to be used for loop control, we could test for
> that case by checking whether while_ult_optab is implemented.
> 
> Thanks,
> Richard
> 

Thanks for your further expalanation, as you pointed out, my concern
is just one case of mixing mask-based and length-based.  I didn't
realize it and thought we still used one approach for one loop at the
time, but it's senseless.

The v3 patch attached to use can_partial_vect_p.  In the regression
testing with explicit vect-with-length-scope setting, I saw several
reduction failures, updated vectorizable_condition to set
can_partial_vect_p to false for !EXTRACT_LAST_REDUCTION under your
guidance to ensure it either records sth. or clearing
can_partial_vect_p.

Bootstrapped/regtested on powerpc64le-linux-gnu P9 and no remarkable
failures found even with explicit vect-with-length-scope settings.

But I met one regression failure on aarch64-linux-gnu as below:

PASS->FAIL: gcc.target/aarch64/sve/reduc_8.c -march=armv8.2-a+sve  
scan-assembler-not \\tcmpeq\\tp[0-9]+\\.s,

It's caused by vectorizable_condition's change, without the change,
it can use fully-masking for the outer loop.  The reduction_type is
TREE_CODE_REDUCTION here, so can_partial_vect_p gets cleared.

>From the optimized dumping, the previous IRs look fine.  It's doing
reduction for inner loop, but we are checking partial vectorisation
for the outer loop.  I'm not sure whether to adjust the current
guard is reasonable for this case.  Could you help to give some
insights?  Thanks in advance!

BR,
Kewen
------
gcc/ChangeLog

2020-MM-DD  Kewen Lin  <li...@gcc.gnu.org>

        * doc/invoke.texi (vect-with-length-scope): Document new option.
        * params.opt (vect-with-length-scope): New.
        * tree-vect-loop-manip.c (vect_set_loop_mask): Renamed to ...
        (vect_set_loop_mask_or_len): ... this.  Update variable names
        accordingly.
        (vect_maybe_permute_loop_masks): Replace rgroup_masks with rgroup_objs.
        (vect_set_loop_masks_directly): Renamed to ...
        (vect_set_loop_objs_directly): ... this.  Extend the support to cover
        vector with length, call vect_gen_len for length, replace rgroup_masks
        with rgroup_objs, replace vect_set_loop_mask with
        vect_set_loop_mask_or_len.
        (vect_set_loop_condition_masked): Renamed to ...
        (vect_set_loop_condition_partial): ... this.  Extend the support to
        cover length-based partial vectorization, replace rgroup_masks with
        rgroup_objs, replace vect_iv_limit_for_full_masking with
        vect_iv_limit_for_partial_vect.
        (vect_set_loop_condition_unmasked): Renamed to ...
        (vect_set_loop_condition_normal): ... this.
        (vect_set_loop_condition): Replace vect_set_loop_condition_masked with
        vect_set_loop_condition_partial, replace
        vect_set_loop_condition_unmasked with vect_set_loop_condition_normal.
        (vect_gen_vector_loop_niters): Use LOOP_VINFO_PARTIAL_VECT_P for
        partial vectorization case instead of LOOP_VINFO_FULLY_MASKED_P.
        (vect_do_peeling): Use LOOP_VINFO_PARTIAL_VECT_P for partial
        vectorization case instead of LOOP_VINFO_FULLY_MASKED_P, adjust for
        epilogue handling for length-based partial vectorization.
        * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
        fully_with_length_p and epil_partial_vect_p, replace can_fully_mask_p
        with can_partial_vect_p.
        (release_vec_loop_masks): Replace rgroup_masks with rgroup_objs.
        (release_vec_loop_lens): New function.
        (_loop_vec_info::~_loop_vec_info): Use it to free the loop lens.
        (can_produce_all_loop_masks_p): Replace rgroup_masks with rgroup_objs.
        (vect_get_max_nscalars_per_iter): Likewise.
        (min_prec_for_max_niters): New function.  Factored out from ...
        (vect_verify_full_masking): ... this.  Replace
        vect_iv_limit_for_full_masking with vect_iv_limit_for_partial_vect.
        (vect_verify_loop_lens): New function.
        (vect_analyze_loop_costing): Use LOOP_VINFO_PARTIAL_VECT_P for partial
        vectorization case instead of LOOP_VINFO_FULLY_MASKED_P.
        (determine_peel_for_niter): Likewise.
        (vect_analyze_loop_2): Replace LOOP_VINFO_CAN_FULLY_MASK_P with
        LOOP_VINFO_CAN_PARTIAL_VECT_P, replace LOOP_VINFO_FULLY_MASKED_P with
        LOOP_VINFO_PARTIAL_VECT_P.  Check loop-wide reasons for disabling loops
        with length.  Make the final decision about use vector access with
        length or not.  Disable LOOP_VINFO_CAN_PARTIAL_VECT_P if both
        length-based and length-based approaches recorded.  Mark epilogue go
        with length-based approach if suitable.
        (vect_analyze_loop): Add handlings for epilogue of loop that is marked
        to use partial vectorization approach.
        (vect_estimate_min_profitable_iters): Replace rgroup_masks with
        rgroup_objs.  Adjust for loop with length-based partial vectorization.
        (vectorizable_reduction): Replace LOOP_VINFO_CAN_FULLY_MASK_P with
        LOOP_VINFO_CAN_PARTIAL_VECT_P, adjust some dumpings.
        (vectorizable_live_operation): Likewise.
        (vect_record_loop_mask): Replace rgroup_masks with rgroup_objs.
        (vect_get_loop_mask): Likewise.
        (vect_record_loop_len): New function.
        (vect_get_loop_len): Likewise.
        (vect_transform_loop): Use LOOP_VINFO_PARTIAL_VECT_P for partial
        vectorization case instead of LOOP_VINFO_FULLY_MASKED_P.
        (vect_iv_limit_for_full_masking): Renamed to ...
        (vect_iv_limit_for_partial_vect): ... here. 
        * tree-vect-stmts.c (permute_vec_elements):
        (check_load_store_masking): Renamed to ...
        (check_load_store_partial_vect): ... here.  Add length-based partial
        vectorization checks.
        (vectorizable_operation): Replace LOOP_VINFO_CAN_FULLY_MASK_P with
        LOOP_VINFO_CAN_PARTIAL_VECT_P.
        (vectorizable_store): Replace check_load_store_masking with
        check_load_store_partial_vect.  Add handlings for length-based partial
        vectorization.
        (vectorizable_load): Likewise.
        (vectorizable_condition): Replace LOOP_VINFO_CAN_FULLY_MASK_P with
        LOOP_VINFO_CAN_PARTIAL_VECT_P.  Guard partial vectorization reduction
        only for EXTRACT_LAST_REDUCTION.
        (vect_gen_len): New function.
        * tree-vectorizer.h (struct rgroup_masks): Renamed to ...
        (struct rgroup_objs): ... this.  Add anonymous union to field
        max_nscalars_per_iter and mask_type.
        (vec_loop_lens): New typedef.
        (_loop_vec_info): Add lens, fully_with_length_p and
        epil_partial_vect_p.  Rename can_fully_mask_p to can_partial_vect_p.
        (LOOP_VINFO_CAN_FULLY_MASK_P): Renamed to ...
        (LOOP_VINFO_CAN_PARTIAL_VECT_P): ... this.
        (LOOP_VINFO_FULLY_WITH_LENGTH_P): New macro.
        (LOOP_VINFO_EPIL_PARTIAL_VECT_P): Likewise.
        (LOOP_VINFO_LENS): Likewise.
        (LOOP_VINFO_PARTIAL_VECT_P): Likewise.
        (vect_iv_limit_for_full_masking): Renamed to ...
        (vect_iv_limit_for_partial_vect): ... this.
        (vect_record_loop_len): New declare.
        (vect_get_loop_len): Likewise.
        (vect_gen_len): Likewise.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 8b9935dfe65..ac765feab13 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13079,6 +13079,13 @@ by the copy loop headers pass.
 @item vect-epilogues-nomask
 Enable loop epilogue vectorization using smaller vector size.
 
+@item vect-with-length-scope
+Control the scope of vector memory access with length exploitation.  0 means we
+don't expliot any vector memory access with length, 1 means we only exploit
+vector memory access with length for those loops whose iteration number are
+less than VF, such as very small loop or epilogue, 2 means we want to exploit
+vector memory access with length for any loops if possible.
+
 @item slp-max-insns-in-bb
 Maximum number of instructions in basic block to be
 considered for SLP vectorization.
diff --git a/gcc/params.opt b/gcc/params.opt
index 4aec480798b..d4309101067 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -964,4 +964,8 @@ Bound on number of runtime checks inserted by the 
vectorizer's loop versioning f
 Common Joined UInteger Var(param_vect_max_version_for_alignment_checks) 
Init(6) Param Optimization
 Bound on number of runtime checks inserted by the vectorizer's loop versioning 
for alignment check.
 
+-param=vect-with-length-scope=
+Common Joined UInteger Var(param_vect_with_length_scope) Init(0) 
IntegerRange(0, 2) Param Optimization
+Control the vector with length exploitation scope.
+
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 8c5e696b995..0a5770c7d28 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -256,17 +256,17 @@ adjust_phi_and_debug_stmts (gimple *update_phi, edge e, 
tree new_def)
                        gimple_bb (update_phi));
 }
 
-/* Define one loop mask MASK from loop LOOP.  INIT_MASK is the value that
-   the mask should have during the first iteration and NEXT_MASK is the
+/* Define one loop mask/length OBJ from loop LOOP.  INIT_OBJ is the value that
+   the mask/length should have during the first iteration and NEXT_OBJ is the
    value that it should have on subsequent iterations.  */
 
 static void
-vect_set_loop_mask (class loop *loop, tree mask, tree init_mask,
-                   tree next_mask)
+vect_set_loop_mask_or_len (class loop *loop, tree obj, tree init_obj,
+                          tree next_obj)
 {
-  gphi *phi = create_phi_node (mask, loop->header);
-  add_phi_arg (phi, init_mask, loop_preheader_edge (loop), UNKNOWN_LOCATION);
-  add_phi_arg (phi, next_mask, loop_latch_edge (loop), UNKNOWN_LOCATION);
+  gphi *phi = create_phi_node (obj, loop->header);
+  add_phi_arg (phi, init_obj, loop_preheader_edge (loop), UNKNOWN_LOCATION);
+  add_phi_arg (phi, next_obj, loop_latch_edge (loop), UNKNOWN_LOCATION);
 }
 
 /* Add SEQ to the end of LOOP's preheader block.  */
@@ -320,8 +320,8 @@ interleave_supported_p (vec_perm_indices *indices, tree 
vectype,
    latter.  Return true on success, adding any new statements to SEQ.  */
 
 static bool
-vect_maybe_permute_loop_masks (gimple_seq *seq, rgroup_masks *dest_rgm,
-                              rgroup_masks *src_rgm)
+vect_maybe_permute_loop_masks (gimple_seq *seq, rgroup_objs *dest_rgm,
+                              rgroup_objs *src_rgm)
 {
   tree src_masktype = src_rgm->mask_type;
   tree dest_masktype = dest_rgm->mask_type;
@@ -338,10 +338,10 @@ vect_maybe_permute_loop_masks (gimple_seq *seq, 
rgroup_masks *dest_rgm,
       machine_mode dest_mode = insn_data[icode1].operand[0].mode;
       gcc_assert (dest_mode == insn_data[icode2].operand[0].mode);
       tree unpack_masktype = vect_halve_mask_nunits (src_masktype, dest_mode);
-      for (unsigned int i = 0; i < dest_rgm->masks.length (); ++i)
+      for (unsigned int i = 0; i < dest_rgm->objs.length (); ++i)
        {
-         tree src = src_rgm->masks[i / 2];
-         tree dest = dest_rgm->masks[i];
+         tree src = src_rgm->objs[i / 2];
+         tree dest = dest_rgm->objs[i];
          tree_code code = ((i & 1) == (BYTES_BIG_ENDIAN ? 0 : 1)
                            ? VEC_UNPACK_HI_EXPR
                            : VEC_UNPACK_LO_EXPR);
@@ -371,10 +371,10 @@ vect_maybe_permute_loop_masks (gimple_seq *seq, 
rgroup_masks *dest_rgm,
       tree masks[2];
       for (unsigned int i = 0; i < 2; ++i)
        masks[i] = vect_gen_perm_mask_checked (src_masktype, indices[i]);
-      for (unsigned int i = 0; i < dest_rgm->masks.length (); ++i)
+      for (unsigned int i = 0; i < dest_rgm->objs.length (); ++i)
        {
-         tree src = src_rgm->masks[i / 2];
-         tree dest = dest_rgm->masks[i];
+         tree src = src_rgm->objs[i / 2];
+         tree dest = dest_rgm->objs[i];
          gimple *stmt = gimple_build_assign (dest, VEC_PERM_EXPR,
                                              src, src, masks[i & 1]);
          gimple_seq_add_stmt (seq, stmt);
@@ -384,60 +384,80 @@ vect_maybe_permute_loop_masks (gimple_seq *seq, 
rgroup_masks *dest_rgm,
   return false;
 }
 
-/* Helper for vect_set_loop_condition_masked.  Generate definitions for
-   all the masks in RGM and return a mask that is nonzero when the loop
+/* Helper for vect_set_loop_condition_partial.  Generate definitions for
+   all the objs in RGO and return a obj that is nonzero when the loop
    needs to iterate.  Add any new preheader statements to PREHEADER_SEQ.
    Use LOOP_COND_GSI to insert code before the exit gcond.
 
-   RGM belongs to loop LOOP.  The loop originally iterated NITERS
+   RGO belongs to loop LOOP.  The loop originally iterated NITERS
    times and has been vectorized according to LOOP_VINFO.
 
    If NITERS_SKIP is nonnull, the first iteration of the vectorized loop
    starts with NITERS_SKIP dummy iterations of the scalar loop before
-   the real work starts.  The mask elements for these dummy iterations
+   the real work starts.  The obj elements for these dummy iterations
    must be 0, to ensure that the extra iterations do not have an effect.
 
    It is known that:
 
-     NITERS * RGM->max_nscalars_per_iter
+     NITERS * RGO->max_nscalars_per_iter
 
    does not overflow.  However, MIGHT_WRAP_P says whether an induction
    variable that starts at 0 and has step:
 
-     VF * RGM->max_nscalars_per_iter
+     VF * RGO->max_nscalars_per_iter
 
    might overflow before hitting a value above:
 
-     (NITERS + NITERS_SKIP) * RGM->max_nscalars_per_iter
+     (NITERS + NITERS_SKIP) * RGO->max_nscalars_per_iter
 
    This means that we cannot guarantee that such an induction variable
-   would ever hit a value that produces a set of all-false masks for RGM.  */
+   would ever hit a value that produces a set of all-false masks or
+   zero byte length for RGO.  */
 
 static tree
-vect_set_loop_masks_directly (class loop *loop, loop_vec_info loop_vinfo,
+vect_set_loop_objs_directly (class loop *loop, loop_vec_info loop_vinfo,
                              gimple_seq *preheader_seq,
                              gimple_stmt_iterator loop_cond_gsi,
-                             rgroup_masks *rgm, tree niters, tree niters_skip,
+                             rgroup_objs *rgo, tree niters, tree niters_skip,
                              bool might_wrap_p)
 {
   tree compare_type = LOOP_VINFO_MASK_COMPARE_TYPE (loop_vinfo);
   tree iv_type = LOOP_VINFO_MASK_IV_TYPE (loop_vinfo);
-  tree mask_type = rgm->mask_type;
-  unsigned int nscalars_per_iter = rgm->max_nscalars_per_iter;
-  poly_uint64 nscalars_per_mask = TYPE_VECTOR_SUBPARTS (mask_type);
+
+  bool vect_for_masking = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+  if (!vect_for_masking)
+    {
+      /* Obtain target supported length type.  */
+      scalar_int_mode len_mode = targetm.vectorize.length_mode;
+      unsigned int len_prec = GET_MODE_PRECISION (len_mode);
+      compare_type = build_nonstandard_integer_type (len_prec, true);
+      /* Simply set iv_type as same as compare_type.  */
+      iv_type = compare_type;
+    }
+
+  tree obj_type = rgo->mask_type;
+  /* Here, take nscalars_per_iter as nbytes_per_iter for length.  */
+  unsigned int nscalars_per_iter = rgo->max_nscalars_per_iter;
+  poly_uint64 nscalars_per_obj = TYPE_VECTOR_SUBPARTS (obj_type);
+  poly_uint64 vector_size = GET_MODE_SIZE (TYPE_MODE (obj_type));
   poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  tree vec_size = NULL_TREE;
+  /* For length, we probably need vec_size to check length in range.  */
+  if (!vect_for_masking)
+    vec_size = build_int_cst (compare_type, vector_size);
 
   /* Calculate the maximum number of scalar values that the rgroup
      handles in total, the number that it handles for each iteration
      of the vector loop, and the number that it should skip during the
-     first iteration of the vector loop.  */
+     first iteration of the vector loop.  For vector with length, take
+     scalar values as bytes.  */
   tree nscalars_total = niters;
   tree nscalars_step = build_int_cst (iv_type, vf);
   tree nscalars_skip = niters_skip;
   if (nscalars_per_iter != 1)
     {
-      /* We checked before choosing to use a fully-masked loop that these
-        multiplications don't overflow.  */
+      /* We checked before choosing to use a fully-masked or fully with length
+        loop that these multiplications don't overflow.  */
       tree compare_factor = build_int_cst (compare_type, nscalars_per_iter);
       tree iv_factor = build_int_cst (iv_type, nscalars_per_iter);
       nscalars_total = gimple_build (preheader_seq, MULT_EXPR, compare_type,
@@ -541,28 +561,28 @@ vect_set_loop_masks_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   test_index = gimple_convert (&test_seq, compare_type, test_index);
   gsi_insert_seq_before (test_gsi, test_seq, GSI_SAME_STMT);
 
-  /* Provide a definition of each mask in the group.  */
-  tree next_mask = NULL_TREE;
-  tree mask;
+  /* Provide a definition of each obj in the group.  */
+  tree next_obj = NULL_TREE;
+  tree obj;
   unsigned int i;
-  FOR_EACH_VEC_ELT_REVERSE (rgm->masks, i, mask)
+  poly_uint64 batch_cnt = vect_for_masking ? nscalars_per_obj : vector_size;
+  FOR_EACH_VEC_ELT_REVERSE (rgo->objs, i, obj)
     {
-      /* Previous masks will cover BIAS scalars.  This mask covers the
+      /* Previous objs will cover BIAS scalars.  This obj covers the
         next batch.  */
-      poly_uint64 bias = nscalars_per_mask * i;
+      poly_uint64 bias = batch_cnt * i;
       tree bias_tree = build_int_cst (compare_type, bias);
-      gimple *tmp_stmt;
 
       /* See whether the first iteration of the vector loop is known
-        to have a full mask.  */
+        to have a full mask or length.  */
       poly_uint64 const_limit;
       bool first_iteration_full
        = (poly_int_tree_p (first_limit, &const_limit)
-          && known_ge (const_limit, (i + 1) * nscalars_per_mask));
+          && known_ge (const_limit, (i + 1) * batch_cnt));
 
       /* Rather than have a new IV that starts at BIAS and goes up to
         TEST_LIMIT, prefer to use the same 0-based IV for each mask
-        and adjust the bound down by BIAS.  */
+        or length and adjust the bound down by BIAS.  */
       tree this_test_limit = test_limit;
       if (i != 0)
        {
@@ -574,9 +594,9 @@ vect_set_loop_masks_directly (class loop *loop, 
loop_vec_info loop_vinfo,
                                          bias_tree);
        }
 
-      /* Create the initial mask.  First include all scalars that
+      /* Create the initial obj.  First include all scalars that
         are within the loop limit.  */
-      tree init_mask = NULL_TREE;
+      tree init_obj = NULL_TREE;
       if (!first_iteration_full)
        {
          tree start, end;
@@ -598,9 +618,18 @@ vect_set_loop_masks_directly (class loop *loop, 
loop_vec_info loop_vinfo,
              end = first_limit;
            }
 
-         init_mask = make_temp_ssa_name (mask_type, NULL, "max_mask");
-         tmp_stmt = vect_gen_while (init_mask, start, end);
-         gimple_seq_add_stmt (preheader_seq, tmp_stmt);
+         if (vect_for_masking)
+           {
+             init_obj = make_temp_ssa_name (obj_type, NULL, "max_mask");
+             gimple *tmp_stmt = vect_gen_while (init_obj, start, end);
+             gimple_seq_add_stmt (preheader_seq, tmp_stmt);
+           }
+         else
+           {
+             init_obj = make_temp_ssa_name (compare_type, NULL, "max_len");
+             gimple_seq seq = vect_gen_len (init_obj, start, end, vec_size);
+             gimple_seq_add_seq (preheader_seq, seq);
+           }
        }
 
       /* Now AND out the bits that are within the number of skipped
@@ -610,51 +639,76 @@ vect_set_loop_masks_directly (class loop *loop, 
loop_vec_info loop_vinfo,
          && !(poly_int_tree_p (nscalars_skip, &const_skip)
               && known_le (const_skip, bias)))
        {
-         tree unskipped_mask = vect_gen_while_not (preheader_seq, mask_type,
+         tree unskipped_mask = vect_gen_while_not (preheader_seq, obj_type,
                                                    bias_tree, nscalars_skip);
-         if (init_mask)
-           init_mask = gimple_build (preheader_seq, BIT_AND_EXPR, mask_type,
-                                     init_mask, unskipped_mask);
+         if (init_obj)
+           init_obj = gimple_build (preheader_seq, BIT_AND_EXPR, obj_type,
+                                     init_obj, unskipped_mask);
          else
-           init_mask = unskipped_mask;
+           init_obj = unskipped_mask;
+         gcc_assert (vect_for_masking);
        }
 
-      if (!init_mask)
-       /* First iteration is full.  */
-       init_mask = build_minus_one_cst (mask_type);
+      /* First iteration is full.  */
+      if (!init_obj)
+       {
+         if (vect_for_masking)
+           init_obj = build_minus_one_cst (obj_type);
+         else
+           init_obj = vec_size;
+       }
 
-      /* Get the mask value for the next iteration of the loop.  */
-      next_mask = make_temp_ssa_name (mask_type, NULL, "next_mask");
-      gcall *call = vect_gen_while (next_mask, test_index, this_test_limit);
-      gsi_insert_before (test_gsi, call, GSI_SAME_STMT);
+      /* Get the obj value for the next iteration of the loop.  */
+      if (vect_for_masking)
+       {
+         next_obj = make_temp_ssa_name (obj_type, NULL, "next_mask");
+         gcall *call = vect_gen_while (next_obj, test_index, this_test_limit);
+         gsi_insert_before (test_gsi, call, GSI_SAME_STMT);
+       }
+      else
+       {
+         next_obj = make_temp_ssa_name (compare_type, NULL, "next_len");
+         tree end = this_test_limit;
+         gimple_seq seq = vect_gen_len (next_obj, test_index, end, vec_size);
+         gsi_insert_seq_before (test_gsi, seq, GSI_SAME_STMT);
+       }
 
-      vect_set_loop_mask (loop, mask, init_mask, next_mask);
+      vect_set_loop_mask_or_len (loop, obj, init_obj, next_obj);
     }
-  return next_mask;
+  return next_obj;
 }
 
-/* Make LOOP iterate NITERS times using masking and WHILE_ULT calls.
-   LOOP_VINFO describes the vectorization of LOOP.  NITERS is the
-   number of iterations of the original scalar loop that should be
-   handled by the vector loop.  NITERS_MAYBE_ZERO and FINAL_IV are
-   as for vect_set_loop_condition.
+/* Make LOOP iterate NITERS times using objects like masks (and
+   WHILE_ULT calls) or lengths.  LOOP_VINFO describes the vectorization
+   of LOOP.  NITERS is the number of iterations of the original scalar
+   loop that should be handled by the vector loop.  NITERS_MAYBE_ZERO
+   and FINAL_IV are as for vect_set_loop_condition.
 
    Insert the branch-back condition before LOOP_COND_GSI and return the
    final gcond.  */
 
 static gcond *
-vect_set_loop_condition_masked (class loop *loop, loop_vec_info loop_vinfo,
-                               tree niters, tree final_iv,
-                               bool niters_maybe_zero,
-                               gimple_stmt_iterator loop_cond_gsi)
+vect_set_loop_condition_partial (class loop *loop, loop_vec_info loop_vinfo,
+                                tree niters, tree final_iv,
+                                bool niters_maybe_zero,
+                                gimple_stmt_iterator loop_cond_gsi)
 {
   gimple_seq preheader_seq = NULL;
   gimple_seq header_seq = NULL;
 
+  bool vect_for_masking = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
   tree compare_type = LOOP_VINFO_MASK_COMPARE_TYPE (loop_vinfo);
+  if (!vect_for_masking)
+    {
+      /* Obtain target supported length type as compare_type.  */
+      scalar_int_mode len_mode = targetm.vectorize.length_mode;
+      unsigned len_prec = GET_MODE_PRECISION (len_mode);
+      compare_type = build_nonstandard_integer_type (len_prec, true);
+    }
   unsigned int compare_precision = TYPE_PRECISION (compare_type);
-  tree orig_niters = niters;
 
+  tree orig_niters = niters;
   /* Type of the initial value of NITERS.  */
   tree ni_actual_type = TREE_TYPE (niters);
   unsigned int ni_actual_precision = TYPE_PRECISION (ni_actual_type);
@@ -677,42 +731,45 @@ vect_set_loop_condition_masked (class loop *loop, 
loop_vec_info loop_vinfo,
   else
     niters = gimple_convert (&preheader_seq, compare_type, niters);
 
-  widest_int iv_limit = vect_iv_limit_for_full_masking (loop_vinfo);
+  widest_int iv_limit = vect_iv_limit_for_partial_vect (loop_vinfo);
 
-  /* Iterate over all the rgroups and fill in their masks.  We could use
-     the first mask from any rgroup for the loop condition; here we
+  /* Iterate over all the rgroups and fill in their objs.  We could use
+     the first obj from any rgroup for the loop condition; here we
      arbitrarily pick the last.  */
-  tree test_mask = NULL_TREE;
-  rgroup_masks *rgm;
+  tree test_obj = NULL_TREE;
+  rgroup_objs *rgo;
   unsigned int i;
-  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
-  FOR_EACH_VEC_ELT (*masks, i, rgm)
-    if (!rgm->masks.is_empty ())
+  auto_vec<rgroup_objs> *objs = vect_for_masking
+                                 ? &LOOP_VINFO_MASKS (loop_vinfo)
+                                 : &LOOP_VINFO_LENS (loop_vinfo);
+
+  FOR_EACH_VEC_ELT (*objs, i, rgo)
+    if (!rgo->objs.is_empty ())
       {
        /* First try using permutes.  This adds a single vector
           instruction to the loop for each mask, but needs no extra
           loop invariants or IVs.  */
        unsigned int nmasks = i + 1;
-       if ((nmasks & 1) == 0)
+       if (vect_for_masking && (nmasks & 1) == 0)
          {
-           rgroup_masks *half_rgm = &(*masks)[nmasks / 2 - 1];
-           if (!half_rgm->masks.is_empty ()
-               && vect_maybe_permute_loop_masks (&header_seq, rgm, half_rgm))
+           rgroup_objs *half_rgo = &(*objs)[nmasks / 2 - 1];
+           if (!half_rgo->objs.is_empty ()
+               && vect_maybe_permute_loop_masks (&header_seq, rgo, half_rgo))
              continue;
          }
 
        /* See whether zero-based IV would ever generate all-false masks
-          before wrapping around.  */
+          or zero byte length before wrapping around.  */
        bool might_wrap_p
          = (iv_limit == -1
-            || (wi::min_precision (iv_limit * rgm->max_nscalars_per_iter,
+            || (wi::min_precision (iv_limit * rgo->max_nscalars_per_iter,
                                    UNSIGNED)
                 > compare_precision));
 
-       /* Set up all masks for this group.  */
-       test_mask = vect_set_loop_masks_directly (loop, loop_vinfo,
+       /* Set up all masks/lengths for this group.  */
+       test_obj = vect_set_loop_objs_directly (loop, loop_vinfo,
                                                  &preheader_seq,
-                                                 loop_cond_gsi, rgm,
+                                                 loop_cond_gsi, rgo,
                                                  niters, niters_skip,
                                                  might_wrap_p);
       }
@@ -724,8 +781,8 @@ vect_set_loop_condition_masked (class loop *loop, 
loop_vec_info loop_vinfo,
   /* Get a boolean result that tells us whether to iterate.  */
   edge exit_edge = single_exit (loop);
   tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : NE_EXPR;
-  tree zero_mask = build_zero_cst (TREE_TYPE (test_mask));
-  gcond *cond_stmt = gimple_build_cond (code, test_mask, zero_mask,
+  tree zero_obj = build_zero_cst (TREE_TYPE (test_obj));
+  gcond *cond_stmt = gimple_build_cond (code, test_obj, zero_obj,
                                        NULL_TREE, NULL_TREE);
   gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
 
@@ -748,13 +805,12 @@ vect_set_loop_condition_masked (class loop *loop, 
loop_vec_info loop_vinfo,
 }
 
 /* Like vect_set_loop_condition, but handle the case in which there
-   are no loop masks.  */
+   are no loop masks/lengths.  */
 
 static gcond *
-vect_set_loop_condition_unmasked (class loop *loop, tree niters,
-                                 tree step, tree final_iv,
-                                 bool niters_maybe_zero,
-                                 gimple_stmt_iterator loop_cond_gsi)
+vect_set_loop_condition_normal (class loop *loop, tree niters, tree step,
+                             tree final_iv, bool niters_maybe_zero,
+                             gimple_stmt_iterator loop_cond_gsi)
 {
   tree indx_before_incr, indx_after_incr;
   gcond *cond_stmt;
@@ -912,14 +968,14 @@ vect_set_loop_condition (class loop *loop, loop_vec_info 
loop_vinfo,
   gcond *orig_cond = get_loop_exit_condition (loop);
   gimple_stmt_iterator loop_cond_gsi = gsi_for_stmt (orig_cond);
 
-  if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
-    cond_stmt = vect_set_loop_condition_masked (loop, loop_vinfo, niters,
-                                               final_iv, niters_maybe_zero,
-                                               loop_cond_gsi);
+  if (loop_vinfo && LOOP_VINFO_PARTIAL_VECT_P (loop_vinfo))
+    cond_stmt
+      = vect_set_loop_condition_partial (loop, loop_vinfo, niters, final_iv,
+                                        niters_maybe_zero, loop_cond_gsi);
   else
-    cond_stmt = vect_set_loop_condition_unmasked (loop, niters, step,
-                                                 final_iv, niters_maybe_zero,
-                                                 loop_cond_gsi);
+    cond_stmt
+      = vect_set_loop_condition_normal (loop, niters, step, final_iv,
+                                       niters_maybe_zero, loop_cond_gsi);
 
   /* Remove old loop exit test.  */
   stmt_vec_info orig_cond_info;
@@ -1938,8 +1994,7 @@ vect_gen_vector_loop_niters (loop_vec_info loop_vinfo, 
tree niters,
     ni_minus_gap = niters;
 
   unsigned HOST_WIDE_INT const_vf;
-  if (vf.is_constant (&const_vf)
-      && !LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+  if (vf.is_constant (&const_vf) && !LOOP_VINFO_PARTIAL_VECT_P (loop_vinfo))
     {
       /* Create: niters >> log2(vf) */
       /* If it's known that niters == number of latch executions + 1 doesn't
@@ -2471,7 +2526,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
 
   poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   poly_uint64 bound_epilog = 0;
-  if (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
+  if (!LOOP_VINFO_PARTIAL_VECT_P (loop_vinfo)
       && LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
     bound_epilog += vf - 1;
   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
@@ -2567,7 +2622,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
   if (vect_epilogues
       && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
       && prolog_peeling >= 0
-      && known_eq (vf, lowest_vf))
+      && known_eq (vf, lowest_vf)
+      && !LOOP_VINFO_FULLY_WITH_LENGTH_P (epilogue_vinfo))
     {
       unsigned HOST_WIDE_INT eiters
        = (LOOP_VINFO_INT_NITERS (loop_vinfo)
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 80e33b61be7..99e6cb904ba 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -813,8 +813,10 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
vec_info_shared *shared)
     vec_outside_cost (0),
     vec_inside_cost (0),
     vectorizable (false),
-    can_fully_mask_p (true),
+    can_partial_vect_p (true),
     fully_masked_p (false),
+    fully_with_length_p (false),
+    epil_partial_vect_p (false),
     peeling_for_gaps (false),
     peeling_for_niter (false),
     no_data_dependencies (false),
@@ -880,13 +882,25 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
vec_info_shared *shared)
 void
 release_vec_loop_masks (vec_loop_masks *masks)
 {
-  rgroup_masks *rgm;
+  rgroup_objs *rgm;
   unsigned int i;
   FOR_EACH_VEC_ELT (*masks, i, rgm)
-    rgm->masks.release ();
+    rgm->objs.release ();
   masks->release ();
 }
 
+/* Free all levels of LENS.  */
+
+void
+release_vec_loop_lens (vec_loop_lens *lens)
+{
+  rgroup_objs *rgl;
+  unsigned int i;
+  FOR_EACH_VEC_ELT (*lens, i, rgl)
+    rgl->objs.release ();
+  lens->release ();
+}
+
 /* Free all memory used by the _loop_vec_info, as well as all the
    stmt_vec_info structs of all the stmts in the loop.  */
 
@@ -895,6 +909,7 @@ _loop_vec_info::~_loop_vec_info ()
   free (bbs);
 
   release_vec_loop_masks (&masks);
+  release_vec_loop_lens (&lens);
   delete ivexpr_map;
   delete scan_map;
   epilogue_vinfos.release ();
@@ -935,7 +950,7 @@ cse_and_gimplify_to_preheader (loop_vec_info loop_vinfo, 
tree expr)
 static bool
 can_produce_all_loop_masks_p (loop_vec_info loop_vinfo, tree cmp_type)
 {
-  rgroup_masks *rgm;
+  rgroup_objs *rgm;
   unsigned int i;
   FOR_EACH_VEC_ELT (LOOP_VINFO_MASKS (loop_vinfo), i, rgm)
     if (rgm->mask_type != NULL_TREE
@@ -954,12 +969,40 @@ vect_get_max_nscalars_per_iter (loop_vec_info loop_vinfo)
 {
   unsigned int res = 1;
   unsigned int i;
-  rgroup_masks *rgm;
+  rgroup_objs *rgm;
   FOR_EACH_VEC_ELT (LOOP_VINFO_MASKS (loop_vinfo), i, rgm)
     res = MAX (res, rgm->max_nscalars_per_iter);
   return res;
 }
 
+/* Calculate the minimal bits necessary to represent the maximal iteration
+   count of loop with loop_vec_info LOOP_VINFO which is scaling with a given
+   factor FACTOR.  */
+
+static unsigned
+min_prec_for_max_niters (loop_vec_info loop_vinfo, unsigned int factor)
+{
+  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+
+  /* Get the maximum number of iterations that is representable
+     in the counter type.  */
+  tree ni_type = TREE_TYPE (LOOP_VINFO_NITERSM1 (loop_vinfo));
+  widest_int max_ni = wi::to_widest (TYPE_MAX_VALUE (ni_type)) + 1;
+
+  /* Get a more refined estimate for the number of iterations.  */
+  widest_int max_back_edges;
+  if (max_loop_iterations (loop, &max_back_edges))
+    max_ni = wi::smin (max_ni, max_back_edges + 1);
+
+  /* Account for factor, in which each bit is replicated N times.  */
+  max_ni *= factor;
+
+  /* Work out how many bits we need to represent the limit.  */
+  unsigned int min_ni_width = wi::min_precision (max_ni, UNSIGNED);
+
+  return min_ni_width;
+}
+
 /* Each statement in LOOP_VINFO can be masked where necessary.  Check
    whether we can actually generate the masks required.  Return true if so,
    storing the type of the scalar IV in LOOP_VINFO_MASK_COMPARE_TYPE.  */
@@ -967,7 +1010,6 @@ vect_get_max_nscalars_per_iter (loop_vec_info loop_vinfo)
 static bool
 vect_verify_full_masking (loop_vec_info loop_vinfo)
 {
-  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   unsigned int min_ni_width;
   unsigned int max_nscalars_per_iter
     = vect_get_max_nscalars_per_iter (loop_vinfo);
@@ -978,27 +1020,14 @@ vect_verify_full_masking (loop_vec_info loop_vinfo)
   if (LOOP_VINFO_MASKS (loop_vinfo).is_empty ())
     return false;
 
-  /* Get the maximum number of iterations that is representable
-     in the counter type.  */
-  tree ni_type = TREE_TYPE (LOOP_VINFO_NITERSM1 (loop_vinfo));
-  widest_int max_ni = wi::to_widest (TYPE_MAX_VALUE (ni_type)) + 1;
-
-  /* Get a more refined estimate for the number of iterations.  */
-  widest_int max_back_edges;
-  if (max_loop_iterations (loop, &max_back_edges))
-    max_ni = wi::smin (max_ni, max_back_edges + 1);
-
-  /* Account for rgroup masks, in which each bit is replicated N times.  */
-  max_ni *= max_nscalars_per_iter;
-
   /* Work out how many bits we need to represent the limit.  */
-  min_ni_width = wi::min_precision (max_ni, UNSIGNED);
+  min_ni_width = min_prec_for_max_niters (loop_vinfo, max_nscalars_per_iter);
 
   /* Find a scalar mode for which WHILE_ULT is supported.  */
   opt_scalar_int_mode cmp_mode_iter;
   tree cmp_type = NULL_TREE;
   tree iv_type = NULL_TREE;
-  widest_int iv_limit = vect_iv_limit_for_full_masking (loop_vinfo);
+  widest_int iv_limit = vect_iv_limit_for_partial_vect (loop_vinfo);
   unsigned int iv_precision = UINT_MAX;
 
   if (iv_limit != -1)
@@ -1056,6 +1085,33 @@ vect_verify_full_masking (loop_vec_info loop_vinfo)
   return true;
 }
 
+/* Check whether we can use vector access with length based on precison
+   comparison.  So far, to keep it simple, we only allow the case that the
+   precision of the target supported length is larger than the precision
+   required by loop niters.  */
+
+static bool
+vect_verify_loop_lens (loop_vec_info loop_vinfo)
+{
+  vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
+
+  if (LOOP_VINFO_LENS (loop_vinfo).is_empty ())
+    return false;
+
+  /* The one which has the largest NV should have max bytes per iter.  */
+  rgroup_objs *rgl = &(*lens)[lens->length () - 1];
+
+  /* Work out how many bits we need to represent the limit.  */
+  unsigned int min_ni_width
+    = min_prec_for_max_niters (loop_vinfo, rgl->nbytes_per_iter);
+
+  unsigned len_bits = GET_MODE_PRECISION (targetm.vectorize.length_mode);
+  if (len_bits < min_ni_width)
+    return false;
+
+  return true;
+}
+
 /* Calculate the cost of one scalar iteration of the loop.  */
 static void
 vect_compute_single_scalar_iteration_cost (loop_vec_info loop_vinfo)
@@ -1628,9 +1684,9 @@ vect_analyze_loop_costing (loop_vec_info loop_vinfo)
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   unsigned int assumed_vf = vect_vf_for_cost (loop_vinfo);
 
-  /* Only fully-masked loops can have iteration counts less than the
-     vectorization factor.  */
-  if (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+  /* Only fully-masked or fully with length loops can have iteration counts 
less
+     than the vectorization factor.  */
+  if (!LOOP_VINFO_PARTIAL_VECT_P (loop_vinfo))
     {
       if (known_niters_smaller_than_vf (loop_vinfo))
        {
@@ -1858,7 +1914,7 @@ determine_peel_for_niter (loop_vec_info loop_vinfo)
     th = LOOP_VINFO_COST_MODEL_THRESHOLD (LOOP_VINFO_ORIG_LOOP_INFO
                                          (loop_vinfo));
 
-  if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+  if (LOOP_VINFO_PARTIAL_VECT_P (loop_vinfo))
     /* The main loop handles all iterations.  */
     LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = false;
   else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
@@ -2047,7 +2103,7 @@ vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool 
&fatal, unsigned *n_stmts)
       vect_optimize_slp (loop_vinfo);
     }
 
-  bool saved_can_fully_mask_p = LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo);
+  bool saved_can_partial_vect_p = LOOP_VINFO_CAN_PARTIAL_VECT_P (loop_vinfo);
 
   /* We don't expect to have to roll back to anything other than an empty
      set of rgroups.  */
@@ -2129,10 +2185,24 @@ start_over:
       return ok;
     }
 
+  /* For now, we don't expect to mix both masking and length approaches for one
+     loop, disable it if both are recorded.  */
+  if (LOOP_VINFO_CAN_PARTIAL_VECT_P (loop_vinfo)
+      && !LOOP_VINFO_MASKS (loop_vinfo).is_empty ()
+      && !LOOP_VINFO_LENS (loop_vinfo).is_empty ())
+    {
+      if (dump_enabled_p ())
+       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+                        "can't use a partial vectorized loop because we"
+                        " don't expect to mix partial vectorization"
+                        " approaches for the same loop.\n");
+      LOOP_VINFO_CAN_PARTIAL_VECT_P (loop_vinfo) = false;
+    }
+
   /* Decide whether to use a fully-masked loop for this vectorization
      factor.  */
   LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
-    = (LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo)
+    = (LOOP_VINFO_CAN_PARTIAL_VECT_P (loop_vinfo)
        && vect_verify_full_masking (loop_vinfo));
   if (dump_enabled_p ())
     {
@@ -2144,6 +2214,50 @@ start_over:
                         "not using a fully-masked loop.\n");
     }
 
+  /* Decide whether to use vector access with length.  */
+  LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)
+    = (LOOP_VINFO_CAN_PARTIAL_VECT_P (loop_vinfo)
+       && vect_verify_loop_lens (loop_vinfo));
+
+  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)
+      && (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
+         || LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)))
+    {
+      if (dump_enabled_p ())
+       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+                        "can't use vector access with length becuase peeling"
+                        " for alignment or gaps is required.\n");
+      LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo) = false;
+    }
+
+  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
+    {
+      if (param_vect_with_length_scope == 0)
+       LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo) = false;
+      /* The epilogue and other known niters less than VF cases can still use
+        vector access with length fully.  */
+      else if (param_vect_with_length_scope == 1
+              && !LOOP_VINFO_EPILOGUE_P (loop_vinfo)
+              && !known_niters_smaller_than_vf (loop_vinfo))
+       {
+         LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo) = false;
+         LOOP_VINFO_EPIL_PARTIAL_VECT_P (loop_vinfo) = true;
+       }
+    }
+  else
+    /* Always set it as false in case previous tries set it.  */
+    LOOP_VINFO_EPIL_PARTIAL_VECT_P (loop_vinfo) = false;
+
+  if (dump_enabled_p ())
+    {
+      if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
+       dump_printf_loc (MSG_NOTE, vect_location, "using vector access with"
+                                                 " length for loop fully.\n");
+      else
+       dump_printf_loc (MSG_NOTE, vect_location, "not using vector access with"
+                                                 " length for loop fully.\n");
+    }
+
   /* If epilog loop is required because of data accesses with gaps,
      one additional iteration needs to be peeled.  Check if there is
      enough iterations for vectorization.  */
@@ -2163,7 +2277,7 @@ start_over:
   /* If we're vectorizing an epilogue loop, we either need a fully-masked
      loop or a loop that has a lower VF than the main loop.  */
   if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
-      && !LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
+      && !LOOP_VINFO_PARTIAL_VECT_P (loop_vinfo)
       && maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
                   LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo)))
     return opt_result::failure_at (vect_location,
@@ -2362,12 +2476,13 @@ again:
     = init_cost (LOOP_VINFO_LOOP (loop_vinfo));
   /* Reset accumulated rgroup information.  */
   release_vec_loop_masks (&LOOP_VINFO_MASKS (loop_vinfo));
+  release_vec_loop_lens (&LOOP_VINFO_LENS (loop_vinfo));
   /* Reset assorted flags.  */
   LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = false;
   LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) = false;
   LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = 0;
   LOOP_VINFO_VERSIONING_THRESHOLD (loop_vinfo) = 0;
-  LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = saved_can_fully_mask_p;
+  LOOP_VINFO_CAN_PARTIAL_VECT_P (loop_vinfo) = saved_can_partial_vect_p;
 
   goto start_over;
 }
@@ -2646,8 +2761,10 @@ vect_analyze_loop (class loop *loop, vec_info_shared 
*shared)
              if (ordered_p (lowest_th, th))
                lowest_th = ordered_min (lowest_th, th);
            }
-         else
-           delete loop_vinfo;
+         else {
+             delete loop_vinfo;
+             loop_vinfo = opt_loop_vec_info::success (NULL);
+         }
 
          /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is
             enabled, SIMDUID is not set, it is the innermost loop and we have
@@ -2672,6 +2789,7 @@ vect_analyze_loop (class loop *loop, vec_info_shared 
*shared)
       else
        {
          delete loop_vinfo;
+         loop_vinfo = opt_loop_vec_info::success (NULL);
          if (fatal)
            {
              gcc_checking_assert (first_loop_vinfo == NULL);
@@ -2679,6 +2797,22 @@ vect_analyze_loop (class loop *loop, vec_info_shared 
*shared)
            }
        }
 
+      /* Handle the case that the original loop can use partial vectorization,
+        but want to only adopt it for the epilogue.  The retry should be in the
+        same mode as original.  */
+      if (vect_epilogues && loop_vinfo
+         && LOOP_VINFO_EPIL_PARTIAL_VECT_P (loop_vinfo))
+       {
+         gcc_assert (LOOP_VINFO_CAN_PARTIAL_VECT_P (loop_vinfo)
+                     && !LOOP_VINFO_PARTIAL_VECT_P (loop_vinfo));
+         if (dump_enabled_p ())
+           dump_printf_loc (MSG_NOTE, vect_location,
+                            "***** Re-trying analysis with same vector mode"
+                            " %s for epilogue with partial vectorization.\n",
+                            GET_MODE_NAME (loop_vinfo->vector_mode));
+         continue;
+       }
+
       if (mode_i < vector_modes.length ()
          && VECTOR_MODE_P (autodetected_vector_mode)
          && (related_vector_mode (vector_modes[mode_i],
@@ -3493,7 +3627,7 @@ vect_estimate_min_profitable_iters (loop_vec_info 
loop_vinfo,
 
       /* Calculate how many masks we need to generate.  */
       unsigned int num_masks = 0;
-      rgroup_masks *rgm;
+      rgroup_objs *rgm;
       unsigned int num_vectors_m1;
       FOR_EACH_VEC_ELT (LOOP_VINFO_MASKS (loop_vinfo), num_vectors_m1, rgm)
        if (rgm->mask_type)
@@ -3519,6 +3653,11 @@ vect_estimate_min_profitable_iters (loop_vec_info 
loop_vinfo,
                            target_cost_data, num_masks - 1, vector_stmt,
                            NULL, NULL_TREE, 0, vect_body);
     }
+  else if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
+    {
+      peel_iters_prologue = 0;
+      peel_iters_epilogue = 0;
+    }
   else if (npeel < 0)
     {
       peel_iters_prologue = assumed_vf / 2;
@@ -3808,7 +3947,7 @@ vect_estimate_min_profitable_iters (loop_vec_info 
loop_vinfo,
                 "  Calculated minimum iters for profitability: %d\n",
                 min_profitable_iters);
 
-  if (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
+  if (!LOOP_VINFO_PARTIAL_VECT_P (loop_vinfo)
       && min_profitable_iters < (assumed_vf + peel_iters_prologue))
     /* We want the vectorized loop to execute at least once.  */
     min_profitable_iters = assumed_vf + peel_iters_prologue;
@@ -6761,6 +6900,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
     dump_printf_loc (MSG_NOTE, vect_location,
                     "using an in-order (fold-left) reduction.\n");
   STMT_VINFO_TYPE (orig_stmt_of_analysis) = cycle_phi_info_type;
+
   /* All but single defuse-cycle optimized, lane-reducing and fold-left
      reductions go through their own vectorizable_* routines.  */
   if (!single_defuse_cycle
@@ -6779,7 +6919,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
       STMT_VINFO_DEF_TYPE (vect_orig_stmt (tem)) = vect_internal_def;
       STMT_VINFO_DEF_TYPE (tem) = vect_internal_def;
     }
-  else if (loop_vinfo && LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo))
+  else if (loop_vinfo && LOOP_VINFO_CAN_PARTIAL_VECT_P (loop_vinfo))
     {
       vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
       internal_fn cond_fn = get_conditional_internal_fn (code);
@@ -6792,9 +6932,9 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
        {
          if (dump_enabled_p ())
            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                            "can't use a fully-masked loop because no"
+                            "can't use a partial vectorized loop because no"
                             " conditional operation is available.\n");
-         LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
+         LOOP_VINFO_CAN_PARTIAL_VECT_P (loop_vinfo) = false;
        }
       else if (reduction_type == FOLD_LEFT_REDUCTION
               && reduc_fn == IFN_LAST
@@ -6804,9 +6944,9 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
        {
          if (dump_enabled_p ())
            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                            "can't use a fully-masked loop because no"
+                            "can't use a partial vectorized loop because no"
                             " conditional operation is available.\n");
-         LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
+         LOOP_VINFO_CAN_PARTIAL_VECT_P (loop_vinfo) = false;
        }
       else
        vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num,
@@ -8005,33 +8145,33 @@ vectorizable_live_operation (loop_vec_info loop_vinfo,
   if (!vec_stmt_p)
     {
       /* No transformation required.  */
-      if (LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo))
+      if (LOOP_VINFO_CAN_PARTIAL_VECT_P (loop_vinfo))
        {
          if (!direct_internal_fn_supported_p (IFN_EXTRACT_LAST, vectype,
                                               OPTIMIZE_FOR_SPEED))
            {
              if (dump_enabled_p ())
                dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                                "can't use a fully-masked loop because "
+                                "can't use a partial vectorized loop because "
                                 "the target doesn't support extract last "
                                 "reduction.\n");
-             LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
+             LOOP_VINFO_CAN_PARTIAL_VECT_P (loop_vinfo) = false;
            }
          else if (slp_node)
            {
              if (dump_enabled_p ())
                dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                                "can't use a fully-masked loop because an "
-                                "SLP statement is live after the loop.\n");
-             LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
+                                "can't use a partial vectorized loop because "
+                                "an SLP statement is live after the loop.\n");
+             LOOP_VINFO_CAN_PARTIAL_VECT_P (loop_vinfo) = false;
            }
          else if (ncopies > 1)
            {
              if (dump_enabled_p ())
                dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                                "can't use a fully-masked loop because"
+                                "can't use a partial vectorized loop because"
                                 " ncopies is greater than 1.\n");
-             LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
+             LOOP_VINFO_CAN_PARTIAL_VECT_P (loop_vinfo) = false;
            }
          else
            {
@@ -8041,6 +8181,7 @@ vectorizable_live_operation (loop_vec_info loop_vinfo,
                                     1, vectype, NULL);
            }
        }
+
       return true;
     }
 
@@ -8285,7 +8426,7 @@ vect_record_loop_mask (loop_vec_info loop_vinfo, 
vec_loop_masks *masks,
   gcc_assert (nvectors != 0);
   if (masks->length () < nvectors)
     masks->safe_grow_cleared (nvectors);
-  rgroup_masks *rgm = &(*masks)[nvectors - 1];
+  rgroup_objs *rgm = &(*masks)[nvectors - 1];
   /* The number of scalars per iteration and the number of vectors are
      both compile-time constants.  */
   unsigned int nscalars_per_iter
@@ -8316,24 +8457,24 @@ tree
 vect_get_loop_mask (gimple_stmt_iterator *gsi, vec_loop_masks *masks,
                    unsigned int nvectors, tree vectype, unsigned int index)
 {
-  rgroup_masks *rgm = &(*masks)[nvectors - 1];
+  rgroup_objs *rgm = &(*masks)[nvectors - 1];
   tree mask_type = rgm->mask_type;
 
   /* Populate the rgroup's mask array, if this is the first time we've
      used it.  */
-  if (rgm->masks.is_empty ())
+  if (rgm->objs.is_empty ())
     {
-      rgm->masks.safe_grow_cleared (nvectors);
+      rgm->objs.safe_grow_cleared (nvectors);
       for (unsigned int i = 0; i < nvectors; ++i)
        {
          tree mask = make_temp_ssa_name (mask_type, NULL, "loop_mask");
          /* Provide a dummy definition until the real one is available.  */
          SSA_NAME_DEF_STMT (mask) = gimple_build_nop ();
-         rgm->masks[i] = mask;
+         rgm->objs[i] = mask;
        }
     }
 
-  tree mask = rgm->masks[index];
+  tree mask = rgm->objs[index];
   if (maybe_ne (TYPE_VECTOR_SUBPARTS (mask_type),
                TYPE_VECTOR_SUBPARTS (vectype)))
     {
@@ -8354,6 +8495,66 @@ vect_get_loop_mask (gimple_stmt_iterator *gsi, 
vec_loop_masks *masks,
   return mask;
 }
 
+/* Record that LOOP_VINFO would need LENS to contain a sequence of NVECTORS
+   lengths for vector access with length that each control a vector of type
+   VECTYPE.  */
+
+void
+vect_record_loop_len (loop_vec_info loop_vinfo, vec_loop_lens *lens,
+                      unsigned int nvectors, tree vectype)
+{
+  gcc_assert (nvectors != 0);
+  if (lens->length () < nvectors)
+    lens->safe_grow_cleared (nvectors);
+  rgroup_objs *rgl = &(*lens)[nvectors - 1];
+
+  /* The number of scalars per iteration, total bytes of them and the number of
+     vectors are both compile-time constants.  */
+  poly_uint64 vector_size = GET_MODE_SIZE (TYPE_MODE (vectype));
+  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  unsigned int nbytes_per_iter
+    = exact_div (nvectors * vector_size, vf).to_constant ();
+
+  /* The one associated to the same nvectors should have the same bytes per
+     iteration.  */
+  if (!rgl->vec_type)
+    {
+      rgl->vec_type = vectype;
+      rgl->nbytes_per_iter = nbytes_per_iter;
+    }
+  else
+    gcc_assert (rgl->nbytes_per_iter == nbytes_per_iter);
+}
+
+/* Given a complete set of length LENS, extract length number INDEX for an
+   rgroup that operates on NVECTORS vectors, where 0 <= INDEX < NVECTORS.  */
+
+tree
+vect_get_loop_len (vec_loop_lens *lens, unsigned int nvectors, unsigned int 
index)
+{
+  rgroup_objs *rgl = &(*lens)[nvectors - 1];
+
+  /* Populate the rgroup's len array, if this is the first time we've
+     used it.  */
+  if (rgl->objs.is_empty ())
+    {
+      rgl->objs.safe_grow_cleared (nvectors);
+      for (unsigned int i = 0; i < nvectors; ++i)
+       {
+         scalar_int_mode len_mode = targetm.vectorize.length_mode;
+         unsigned int len_prec = GET_MODE_PRECISION (len_mode);
+         tree len_type = build_nonstandard_integer_type (len_prec, true);
+         tree len = make_temp_ssa_name (len_type, NULL, "loop_len");
+
+         /* Provide a dummy definition until the real one is available.  */
+         SSA_NAME_DEF_STMT (len) = gimple_build_nop ();
+         rgl->objs[i] = len;
+       }
+    }
+
+  return rgl->objs[index];
+}
+
 /* Scale profiling counters by estimation for LOOP which is vectorized
    by factor VF.  */
 
@@ -8713,7 +8914,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
*loop_vectorized_call)
   if (niters_vector == NULL_TREE)
     {
       if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
-         && !LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
+         && !LOOP_VINFO_PARTIAL_VECT_P (loop_vinfo)
          && known_eq (lowest_vf, vf))
        {
          niters_vector
@@ -8881,7 +9082,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
*loop_vectorized_call)
 
   /* True if the final iteration might not handle a full vector's
      worth of scalar iterations.  */
-  bool final_iter_may_be_partial = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+  bool final_iter_may_be_partial = LOOP_VINFO_PARTIAL_VECT_P (loop_vinfo);
   /* The minimum number of iterations performed by the epilogue.  This
      is 1 when peeling for gaps because we always need a final scalar
      iteration.  */
@@ -9184,12 +9385,14 @@ optimize_mask_stores (class loop *loop)
 }
 
 /* Decide whether it is possible to use a zero-based induction variable
-   when vectorizing LOOP_VINFO with a fully-masked loop.  If it is,
-   return the value that the induction variable must be able to hold
-   in order to ensure that the loop ends with an all-false mask.
+   when vectorizing LOOP_VINFO with a fully-masked or fully with length
+   loop.  If it is, return the value that the induction variable must
+   be able to hold in order to ensure that the loop ends with an
+   all-false mask or zero byte length.
    Return -1 otherwise.  */
+
 widest_int
-vect_iv_limit_for_full_masking (loop_vec_info loop_vinfo)
+vect_iv_limit_for_partial_vect (loop_vec_info loop_vinfo)
 {
   tree niters_skip = LOOP_VINFO_MASK_SKIP_NITERS (loop_vinfo);
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index e7822c44951..1bd2d2bd581 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1771,9 +1771,9 @@ static tree permute_vec_elements (vec_info *, tree, tree, 
tree, stmt_vec_info,
                                  gimple_stmt_iterator *);
 
 /* Check whether a load or store statement in the loop described by
-   LOOP_VINFO is possible in a fully-masked loop.  This is testing
-   whether the vectorizer pass has the appropriate support, as well as
-   whether the target does.
+   LOOP_VINFO is possible in a fully-masked or fully with length loop.
+   This is testing whether the vectorizer pass has the appropriate support,
+   as well as whether the target does.
 
    VLS_TYPE says whether the statement is a load or store and VECTYPE
    is the type of the vector being loaded or stored.  MEMORY_ACCESS_TYPE
@@ -1783,14 +1783,14 @@ static tree permute_vec_elements (vec_info *, tree, 
tree, tree, stmt_vec_info,
    its arguments.  If the load or store is conditional, SCALAR_MASK is the
    condition under which it occurs.
 
-   Clear LOOP_VINFO_CAN_FULLY_MASK_P if a fully-masked loop is not
-   supported, otherwise record the required mask types.  */
+   Clear LOOP_VINFO_CAN_PARTIAL_VECT_P if a fully-masked or fully with
+   length loop is not supported, otherwise record the required mask types.  */
 
 static void
-check_load_store_masking (loop_vec_info loop_vinfo, tree vectype,
-                         vec_load_store_type vls_type, int group_size,
-                         vect_memory_access_type memory_access_type,
-                         gather_scatter_info *gs_info, tree scalar_mask)
+check_load_store_partial_vect (loop_vec_info loop_vinfo, tree vectype,
+                              vec_load_store_type vls_type, int group_size,
+                              vect_memory_access_type memory_access_type,
+                              gather_scatter_info *gs_info, tree scalar_mask)
 {
   /* Invariant loads need no special support.  */
   if (memory_access_type == VMAT_INVARIANT)
@@ -1807,10 +1807,10 @@ check_load_store_masking (loop_vec_info loop_vinfo, 
tree vectype,
        {
          if (dump_enabled_p ())
            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                            "can't use a fully-masked loop because the"
-                            " target doesn't have an appropriate masked"
+                            "can't use a partial vectorized loop because"
+                            " the target doesn't have an appropriate"
                             " load/store-lanes instruction.\n");
-         LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
+         LOOP_VINFO_CAN_PARTIAL_VECT_P (loop_vinfo) = false;
          return;
        }
       unsigned int ncopies = vect_get_num_copies (loop_vinfo, vectype);
@@ -1830,10 +1830,10 @@ check_load_store_masking (loop_vec_info loop_vinfo, 
tree vectype,
        {
          if (dump_enabled_p ())
            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                            "can't use a fully-masked loop because the"
-                            " target doesn't have an appropriate masked"
+                            "can't use a partial vectorized loop because"
+                            " the target doesn't have an appropriate"
                             " gather load or scatter store instruction.\n");
-         LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
+         LOOP_VINFO_CAN_PARTIAL_VECT_P (loop_vinfo) = false;
          return;
        }
       unsigned int ncopies = vect_get_num_copies (loop_vinfo, vectype);
@@ -1848,35 +1848,61 @@ check_load_store_masking (loop_vec_info loop_vinfo, 
tree vectype,
         scalar loop.  We need more work to support other mappings.  */
       if (dump_enabled_p ())
        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                        "can't use a fully-masked loop because an access"
-                        " isn't contiguous.\n");
-      LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
+                        "can't use a partial vectorized loop because an"
+                        " access isn't contiguous.\n");
+      LOOP_VINFO_CAN_PARTIAL_VECT_P (loop_vinfo) = false;
       return;
     }
 
-  machine_mode mask_mode;
-  if (!VECTOR_MODE_P (vecmode)
-      || !targetm.vectorize.get_mask_mode (vecmode).exists (&mask_mode)
-      || !can_vec_mask_load_store_p (vecmode, mask_mode, is_load))
+  if (!VECTOR_MODE_P (vecmode))
     {
       if (dump_enabled_p ())
        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                        "can't use a fully-masked loop because the target"
-                        " doesn't have the appropriate masked load or"
-                        " store.\n");
-      LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
+                        "can't use a partial vectorized loop because of"
+                        " the unexpected mode.\n");
+      LOOP_VINFO_CAN_PARTIAL_VECT_P (loop_vinfo) = false;
       return;
     }
-  /* We might load more scalars than we need for permuting SLP loads.
-     We checked in get_group_load_store_type that the extra elements
-     don't leak into a new vector.  */
+
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
   poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   unsigned int nvectors;
-  if (can_div_away_from_zero_p (group_size * vf, nunits, &nvectors))
-    vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype, scalar_mask);
-  else
-    gcc_unreachable ();
+  machine_mode mask_mode;
+  bool partial_vectorized_p = false;
+  if (targetm.vectorize.get_mask_mode (vecmode).exists (&mask_mode)
+      && can_vec_mask_load_store_p (vecmode, mask_mode, is_load))
+    {
+      /* We might load more scalars than we need for permuting SLP loads.
+        We checked in get_group_load_store_type that the extra elements
+        don't leak into a new vector.  */
+      if (can_div_away_from_zero_p (group_size * vf, nunits, &nvectors))
+       vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype,
+                              scalar_mask);
+      else
+       gcc_unreachable ();
+      partial_vectorized_p = true;
+    }
+
+  optab op = is_load ? lenload_optab : lenstore_optab;
+  if (convert_optab_handler (op, vecmode, targetm.vectorize.length_mode))
+    {
+      vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
+      if (can_div_away_from_zero_p (group_size * vf, nunits, &nvectors))
+       vect_record_loop_len (loop_vinfo, lens, nvectors, vectype);
+      else
+       gcc_unreachable ();
+      partial_vectorized_p = true;
+    }
+
+  if (!partial_vectorized_p)
+    {
+      if (dump_enabled_p ())
+       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+                        "can't use a partial vectorized loop because the"
+                        " target doesn't have the appropriate partial"
+                        "vectorized load or store.\n");
+      LOOP_VINFO_CAN_PARTIAL_VECT_P (loop_vinfo) = false;
+    }
 }
 
 /* Return the mask input to a masked load or store.  VEC_MASK is the vectorized
@@ -6187,7 +6213,7 @@ vectorizable_operation (vec_info *vinfo,
         should only change the active lanes of the reduction chain,
         keeping the inactive lanes as-is.  */
       if (loop_vinfo
-         && LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo)
+         && LOOP_VINFO_CAN_PARTIAL_VECT_P (loop_vinfo)
          && reduc_idx >= 0)
        {
          if (cond_fn == IFN_LAST
@@ -6198,7 +6224,7 @@ vectorizable_operation (vec_info *vinfo,
                dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                                 "can't use a fully-masked loop because no"
                                 " conditional operation is available.\n");
-             LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
+             LOOP_VINFO_CAN_PARTIAL_VECT_P (loop_vinfo) = false;
            }
          else
            vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num,
@@ -7527,10 +7553,10 @@ vectorizable_store (vec_info *vinfo,
     {
       STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) = memory_access_type;
 
-      if (loop_vinfo
-         && LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo))
-       check_load_store_masking (loop_vinfo, vectype, vls_type, group_size,
-                                 memory_access_type, &gs_info, mask);
+      if (loop_vinfo && LOOP_VINFO_CAN_PARTIAL_VECT_P (loop_vinfo))
+       check_load_store_partial_vect (loop_vinfo, vectype, vls_type,
+                                      group_size, memory_access_type, &gs_info,
+                                      mask);
 
       if (slp_node
          && !vect_maybe_update_slp_op_vectype (SLP_TREE_CHILDREN (slp_node)[0],
@@ -8068,6 +8094,15 @@ vectorizable_store (vec_info *vinfo,
     = (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
        ? &LOOP_VINFO_MASKS (loop_vinfo)
        : NULL);
+
+  vec_loop_lens *loop_lens
+    = (loop_vinfo && LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)
+        ? &LOOP_VINFO_LENS (loop_vinfo)
+        : NULL);
+
+  /* Shouldn't go with length if fully masked.  */
+  gcc_assert (!loop_lens || (loop_lens && !loop_masks));
+
   /* Targets with store-lane instructions must not require explicit
      realignment.  vect_supportable_dr_alignment always returns either
      dr_aligned or dr_unaligned_supported for masked operations.  */
@@ -8320,10 +8355,15 @@ vectorizable_store (vec_info *vinfo,
              unsigned HOST_WIDE_INT align;
 
              tree final_mask = NULL_TREE;
+             tree final_len = NULL_TREE;
              if (loop_masks)
                final_mask = vect_get_loop_mask (gsi, loop_masks,
                                                 vec_num * ncopies,
                                                 vectype, vec_num * j + i);
+             else if (loop_lens)
+               final_len = vect_get_loop_len (loop_lens, vec_num * ncopies,
+                                              vec_num * j + i);
+
              if (vec_mask)
                final_mask = prepare_load_store_mask (mask_vectype, final_mask,
                                                      vec_mask, gsi);
@@ -8403,6 +8443,17 @@ vectorizable_store (vec_info *vinfo,
                  new_stmt_info
                    = vect_finish_stmt_generation (vinfo, stmt_info, call, gsi);
                }
+             else if (final_len)
+               {
+                 align = least_bit_hwi (misalign | align);
+                 tree ptr = build_int_cst (ref_type, align);
+                 gcall *call
+                   = gimple_build_call_internal (IFN_LEN_STORE, 4, dataref_ptr,
+                                                 ptr, final_len, vec_oprnd);
+                 gimple_call_set_nothrow (call, true);
+                 new_stmt_info
+                   = vect_finish_stmt_generation (vinfo, stmt_info, call, gsi);
+               }
              else
                {
                  data_ref = fold_build2 (MEM_REF, vectype,
@@ -8834,10 +8885,10 @@ vectorizable_load (vec_info *vinfo,
       if (!slp)
        STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) = memory_access_type;
 
-      if (loop_vinfo
-         && LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo))
-       check_load_store_masking (loop_vinfo, vectype, VLS_LOAD, group_size,
-                                 memory_access_type, &gs_info, mask);
+      if (loop_vinfo && LOOP_VINFO_CAN_PARTIAL_VECT_P (loop_vinfo))
+       check_load_store_partial_vect (loop_vinfo, vectype, VLS_LOAD,
+                                      group_size, memory_access_type, &gs_info,
+                                      mask);
 
       STMT_VINFO_TYPE (stmt_info) = load_vec_info_type;
       vect_model_load_cost (vinfo, stmt_info, ncopies, vf, memory_access_type,
@@ -8937,6 +8988,7 @@ vectorizable_load (vec_info *vinfo,
 
       gcc_assert (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo));
       gcc_assert (!nested_in_vect_loop);
+      gcc_assert (!LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo));
 
       if (grouped_load)
        {
@@ -9234,6 +9286,15 @@ vectorizable_load (vec_info *vinfo,
     = (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
        ? &LOOP_VINFO_MASKS (loop_vinfo)
        : NULL);
+
+  vec_loop_lens *loop_lens
+    = (loop_vinfo && LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)
+        ? &LOOP_VINFO_LENS (loop_vinfo)
+        : NULL);
+
+  /* Shouldn't go with length if fully masked.  */
+  gcc_assert (!loop_lens || (loop_lens && !loop_masks));
+
   /* Targets with store-lane instructions must not require explicit
      realignment.  vect_supportable_dr_alignment always returns either
      dr_aligned or dr_unaligned_supported for masked operations.  */
@@ -9555,15 +9616,20 @@ vectorizable_load (vec_info *vinfo,
          for (i = 0; i < vec_num; i++)
            {
              tree final_mask = NULL_TREE;
+             tree final_len = NULL_TREE;
              if (loop_masks
                  && memory_access_type != VMAT_INVARIANT)
                final_mask = vect_get_loop_mask (gsi, loop_masks,
                                                 vec_num * ncopies,
                                                 vectype, vec_num * j + i);
+             else if (loop_lens && memory_access_type != VMAT_INVARIANT)
+               final_len = vect_get_loop_len (loop_lens, vec_num * ncopies,
+                                              vec_num * j + i);
              if (vec_mask)
                final_mask = prepare_load_store_mask (mask_vectype, final_mask,
                                                      vec_mask, gsi);
 
+
              if (i > 0)
                dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr,
                                               gsi, stmt_info, bump);
@@ -9629,6 +9695,18 @@ vectorizable_load (vec_info *vinfo,
                        new_stmt = call;
                        data_ref = NULL_TREE;
                      }
+                   else if (final_len)
+                     {
+                       align = least_bit_hwi (misalign | align);
+                       tree ptr = build_int_cst (ref_type, align);
+                       gcall *call
+                         = gimple_build_call_internal (IFN_LEN_LOAD, 3,
+                                                       dataref_ptr, ptr,
+                                                       final_len);
+                       gimple_call_set_nothrow (call, true);
+                       new_stmt = call;
+                       data_ref = NULL_TREE;
+                     }
                    else
                      {
                        tree ltype = vectype;
@@ -10279,11 +10357,16 @@ vectorizable_condition (vec_info *vinfo,
          return false;
        }
 
-      if (loop_vinfo
-         && LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo)
-         && reduction_type == EXTRACT_LAST_REDUCTION)
-       vect_record_loop_mask (loop_vinfo, &LOOP_VINFO_MASKS (loop_vinfo),
-                              ncopies * vec_num, vectype, NULL);
+      /* For reduction, we expect EXTRACT_LAST_REDUCTION so far.  */
+      if (loop_vinfo && for_reduction
+         && LOOP_VINFO_CAN_PARTIAL_VECT_P (loop_vinfo))
+       {
+         if (reduction_type == EXTRACT_LAST_REDUCTION)
+           vect_record_loop_mask (loop_vinfo, &LOOP_VINFO_MASKS (loop_vinfo),
+                                  ncopies * vec_num, vectype, NULL);
+         else
+           LOOP_VINFO_CAN_PARTIAL_VECT_P (loop_vinfo) = false;
+       }
 
       STMT_VINFO_TYPE (stmt_info) = condition_vec_info_type;
       vect_model_simple_cost (vinfo, stmt_info, ncopies, dts, ndts, slp_node,
@@ -12480,3 +12563,35 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, 
stmt_vec_info stmt_info,
   *nunits_vectype_out = nunits_vectype;
   return opt_result::success ();
 }
+
+/* Generate and return statement sequence that sets vector length LEN that is:
+
+   min_of_start_and_end = min (START_INDEX, END_INDEX);
+   left_bytes = END_INDEX - min_of_start_and_end;
+   rhs = min (left_bytes, VECTOR_SIZE);
+   LEN = rhs;
+
+   TODO: for now, rs6000 supported vector with length only cares 8-bits, which
+   means if we have left_bytes larger than 255, it can't be saturated to vector
+   size.  One target hook can be provided if other ports don't suffer this.
+*/
+
+gimple_seq
+vect_gen_len (tree len, tree start_index, tree end_index, tree vector_size)
+{
+  gimple_seq stmts = NULL;
+  tree len_type = TREE_TYPE (len);
+  gcc_assert (TREE_TYPE (start_index) == len_type);
+
+  tree min = fold_build2 (MIN_EXPR, len_type, start_index, end_index);
+  tree left_bytes = fold_build2 (MINUS_EXPR, len_type, end_index, min);
+  left_bytes = fold_build2 (MIN_EXPR, len_type, left_bytes, vector_size);
+
+  tree rhs = force_gimple_operand (left_bytes, &stmts, true, NULL_TREE);
+  gimple *new_stmt = gimple_build_assign (len, rhs);
+  gimple_stmt_iterator i = gsi_last (stmts);
+  gsi_insert_after_without_update (&i, new_stmt, GSI_CONTINUE_LINKING);
+
+  return stmts;
+}
+
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 2eb3ab5d280..9d84766d724 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -461,20 +461,32 @@ is_a_helper <_bb_vec_info *>::test (vec_info *i)
    first level being indexed by nV - 1 (since nV == 0 doesn't exist) and
    the second being indexed by the mask index 0 <= i < nV.  */
 
-/* The masks needed by rgroups with nV vectors, according to the
-   description above.  */
-struct rgroup_masks {
-  /* The largest nS for all rgroups that use these masks.  */
-  unsigned int max_nscalars_per_iter;
-
-  /* The type of mask to use, based on the highest nS recorded above.  */
-  tree mask_type;
+/* The masks/lengths (called as objects) needed by rgroups with nV vectors,
+   according to the description above.  */
+struct rgroup_objs {
+  union
+  {
+    /* The largest nS for all rgroups that use these masks.  */
+    unsigned int max_nscalars_per_iter;
+    /* The total bytes for any nS per iteration.  */
+    unsigned int nbytes_per_iter;
+  };
 
-  /* A vector of nV masks, in iteration order.  */
-  vec<tree> masks;
+  union
+  {
+    /* The type of mask to use, based on the highest nS recorded above.  */
+    tree mask_type;
+    /* Any vector type to use these lengths.  */
+    tree vec_type;
+  };
+
+  /* A vector of nV objs, in iteration order.  */
+  vec<tree> objs;
 };
 
-typedef auto_vec<rgroup_masks> vec_loop_masks;
+typedef auto_vec<rgroup_objs> vec_loop_masks;
+
+typedef auto_vec<rgroup_objs> vec_loop_lens;
 
 typedef auto_vec<std::pair<data_reference*, tree> > drs_init_vec;
 
@@ -523,6 +535,10 @@ public:
      on inactive scalars.  */
   vec_loop_masks masks;
 
+  /* The lengths that a loop with length should use to avoid operating
+     on inactive scalars.  */
+  vec_loop_lens lens;
+
   /* Set of scalar conditions that have loop mask applied.  */
   scalar_cond_masked_set_type scalar_cond_masked_set;
 
@@ -620,12 +636,20 @@ public:
   /* Is the loop vectorizable? */
   bool vectorizable;
 
-  /* Records whether we still have the option of using a fully-masked loop.  */
-  bool can_fully_mask_p;
+  /* Records whether we can use partial vector approaches for this loop, for
+     now we support masking and length approaches.  */
+  bool can_partial_vect_p;
 
   /* True if have decided to use a fully-masked loop.  */
   bool fully_masked_p;
 
+  /* True if have decided to use length access for the loop fully.  */
+  bool fully_with_length_p;
+
+  /* Records whether we can use partial vector approaches for the epilogue of
+     this loop, for now we only support length approach.  */
+  bool epil_partial_vect_p;
+
   /* When we have grouped data accesses with gaps, we may introduce invalid
      memory accesses.  We peel the last iteration of the loop to prevent
      this.  */
@@ -687,8 +711,11 @@ public:
 #define LOOP_VINFO_COST_MODEL_THRESHOLD(L) (L)->th
 #define LOOP_VINFO_VERSIONING_THRESHOLD(L) (L)->versioning_threshold
 #define LOOP_VINFO_VECTORIZABLE_P(L)       (L)->vectorizable
-#define LOOP_VINFO_CAN_FULLY_MASK_P(L)     (L)->can_fully_mask_p
+#define LOOP_VINFO_CAN_PARTIAL_VECT_P(L)   (L)->can_partial_vect_p
 #define LOOP_VINFO_FULLY_MASKED_P(L)       (L)->fully_masked_p
+#define LOOP_VINFO_FULLY_WITH_LENGTH_P(L)  (L)->fully_with_length_p
+#define LOOP_VINFO_EPIL_PARTIAL_VECT_P(L)  (L)->epil_partial_vect_p
+#define LOOP_VINFO_LENS(L)                 (L)->lens
 #define LOOP_VINFO_VECT_FACTOR(L)          (L)->vectorization_factor
 #define LOOP_VINFO_MAX_VECT_FACTOR(L)      (L)->max_vectorization_factor
 #define LOOP_VINFO_MASKS(L)                (L)->masks
@@ -741,6 +768,10 @@ public:
    || LOOP_REQUIRES_VERSIONING_FOR_NITERS (L)          \
    || LOOP_REQUIRES_VERSIONING_FOR_SIMD_IF_COND (L))
 
+/* Whether operates on partial vector.  */
+#define LOOP_VINFO_PARTIAL_VECT_P(L)                                           
\
+  (LOOP_VINFO_FULLY_MASKED_P (L) || LOOP_VINFO_FULLY_WITH_LENGTH_P (L))
+
 #define LOOP_VINFO_NITERS_KNOWN_P(L)          \
   (tree_fits_shwi_p ((L)->num_iters) && tree_to_shwi ((L)->num_iters) > 0)
 
@@ -1824,7 +1855,7 @@ extern tree vect_create_addr_base_for_vector_ref 
(vec_info *,
                                                  tree, tree = NULL_TREE);
 
 /* In tree-vect-loop.c.  */
-extern widest_int vect_iv_limit_for_full_masking (loop_vec_info loop_vinfo);
+extern widest_int vect_iv_limit_for_partial_vect (loop_vec_info loop_vinfo);
 /* Used in tree-vect-loop-manip.c */
 extern void determine_peel_for_niter (loop_vec_info);
 /* Used in gimple-loop-interchange.c and tree-parloops.c.  */
@@ -1842,6 +1873,10 @@ extern void vect_record_loop_mask (loop_vec_info, 
vec_loop_masks *,
                                   unsigned int, tree, tree);
 extern tree vect_get_loop_mask (gimple_stmt_iterator *, vec_loop_masks *,
                                unsigned int, tree, unsigned int);
+extern void vect_record_loop_len (loop_vec_info, vec_loop_lens *, unsigned int,
+                                 tree);
+extern tree vect_get_loop_len (vec_loop_lens *, unsigned int, unsigned int);
+extern gimple_seq vect_gen_len (tree, tree, tree, tree);
 extern stmt_vec_info info_for_reduction (vec_info *, stmt_vec_info);
 
 /* Drive for loop transformation stage.  */

Reply via email to