The following fixes the computation of supports_partial_vectors which is used to prune the set of modes to iterate over for epilog vectorization. The used partial_vectors_supported_p predicate only looks for while_ult while also support predication when mask modes are integer modes as for AVX512.
I've noticed this isn't very effective on x86_64 anyway since if the main loop mode is autodetected we skip re-analyzing mode_i == 0, but then mode_i == 1 is usually the very same large mode. Thus I do wonder if we should instead always (or when --param vect-partial-vector-usage != 0, or when the target would support predication in principle) perform main loop analysis with partial vectors in mind (start with can_use_partial_vectors_p = true), but only at the end honor the --param when deciding on using_partial_vectors_p. We can then remember can_use_partial_vectors_p for each analyzed mode and use that more specific info for the pruning? For the missed skipping we probably want to increment mode_i based on vect_chooses_same_modes_p, like we do in vect_analyze_loop_1. I'll propose a patch for this - but this would regress --param vect-partial-vector-usage=1 on x86 without the patch below. Bootstrap and regtest running on x86_64-unknown-linux-gnu. OK? * tree-vect-loop.cc (vect_analyze_loop): Consider AVX512 style masking when computing supports_partial_vectors. --- gcc/tree-vect-loop.cc | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index c824b5abaaf..b91ef4a2325 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -3742,8 +3742,15 @@ vect_analyze_loop (class loop *loop, gimple *loop_vectorized_call, vector_modes[0] = autodetected_vector_mode; mode_i = 0; - bool supports_partial_vectors = - partial_vectors_supported_p () && param_vect_partial_vector_usage != 0; + bool supports_partial_vectors = param_vect_partial_vector_usage != 0; + machine_mode mask_mode; + if (supports_partial_vectors + && !partial_vectors_supported_p () + && !(VECTOR_MODE_P (first_loop_vinfo->vector_mode) + && targetm.vectorize.get_mask_mode + (first_loop_vinfo->vector_mode).exists (&mask_mode) + && SCALAR_INT_MODE_P (mask_mode))) + supports_partial_vectors = false; poly_uint64 first_vinfo_vf = LOOP_VINFO_VECT_FACTOR (first_loop_vinfo); loop_vec_info orig_loop_vinfo = first_loop_vinfo; -- 2.43.0