https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95961
Bug ID: 95961 Summary: ICE: in exact_div, at poly-int.h:2182 Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: felix.yang at huawei dot com Target Milestone: --- Target: aarch64 test case: $ cat foo.c typedef struct { unsigned short mprr_2[5][16][16]; } ImageParameters; int s[16][2]; void intrapred_luma_16x16(ImageParameters *img, int s0) { for (int j = 0; j < 16; j++) for (int i = 0; i < 16; i++) { img->mprr_2[1][j][i] = s[j][1]; img->mprr_2[2][j][i] = s0; } } Command line to reproduce: $ gcc -O3 -march=armv8.2-a+sve -fno-vect-cost-model foo.c Call trace: during GIMPLE pass: vect dump file: a-foo.c.163t.vect foo.c: In function ‘intrapred_luma_16x16’: foo.c:7:6: internal compiler error: in exact_div, at poly-int.h:2182 7 | void intrapred_luma_16x16(ImageParameters *img, int s0) | ^~~~~~~~~~~~~~~~~~~~ v0xdb6937 poly_int<2u, poly_result<unsigned long, unsigned long, poly_coeff_pair_traits<unsigned long, unsigned long>::result_kind>::type> exact_div<2u, unsigned long, unsigned long>(poly_int_pod<2u, unsigned long> const&, poly_int_pod<2u, unsigned long> const&) ../../gcc-git/gcc/poly-int.h:2182 0x22934ef vect_get_num_vectors ../../gcc-git/gcc/tree-vectorizer.h:1647 0x2297d5f vect_enhance_data_refs_alignment(_loop_vec_info*) ../../gcc-git/gcc/tree-vect-data-refs.c:1827 0x1686adf vect_analyze_loop_2 ../../gcc-git/gcc/tree-vect-loop.c:2138 0x1688267 vect_analyze_loop(loop*, vec_info_shared*) ../../gcc-git/gcc/tree-vect-loop.c:2612 0x16c77e7 try_vectorize_loop_1 ../../gcc-git/gcc/tree-vectorizer.c:955 0x16c7f6f try_vectorize_loop ../../gcc-git/gcc/tree-vectorizer.c:1110 0x16c811f vectorize_loops() ../../gcc-git/gcc/tree-vectorizer.c:1189 0x151e6df execute ../../gcc-git/gcc/tree-ssa-loop.c:414 In vect_enhance_data_refs_alignment, when we call vect_get_num_vectors, we have: (gdb) p nscalars $11 = {<poly_int_pod<2, unsigned long>> = {coeffs = {2, 2}}, <No data fields>} (gdb) p debug_tree(vectype) <vector_type 0xffffb202f2a0 type <integer_type 0xffffb22305e8 int sizes-gimplified public SI size <integer_cst 0xffffb221ffd8 constant 32> unit-size <integer_cst 0xffffb2234000 constant 4> align:32 warn_if_not_align:0 symtab:0 alias-set 1 canonical-type 0xffffb22305e8 precision:32 min <integer_cst 0xffffb221ff90 -2147483648> max <integer_cst 0xffffb221ffa8 2147483647> pointer_to_this <pointer_type 0xffffb2238a80>> VNx4SI ...... (gdb) p TYPE_VECTOR_SUBPARTS (vectype) $13 = {<poly_int_pod<2, unsigned long>> = {coeffs = {4, 4}}, <No data fields>} nscalars is not a multiple of number of elements of vectype, which triggers the ICE. In the vect pass, vectorization factor computed by vect_determine_vectorization_factor is [8,8]. But this is updated to [1, 1] later by vect_update_vf_for_slp, as indicated in the phase dump: 7860 foo.c:9:3: note: === vect_make_slp_decision === 7861 foo.c:9:3: note: Decided to SLP 2 instances. Unrolling factor [1,1] 7862 foo.c:9:3: note: === vect_detect_hybrid_slp === 7863 foo.c:9:3: note: === vect_update_vf_for_slp === 7864 foo.c:9:3: note: Loop contains only SLP stmts 7865 foo.c:9:3: note: Updating vectorization factor to [1,1]. 7866 foo.c:9:3: note: vectorization_factor = [1,1], niters = 16 This logic here was once changed by commit d9f21f6acb3aa615834e855e16b6311cd18c5668: 323 if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))) 324 { 325 - if (STMT_SLP_TYPE (stmt_info)) 326 - possible_npeel_number 327 - = (vf * GROUP_SIZE (stmt_info)) / nelements; 328 - else 329 - possible_npeel_number = vf / nelements; 330 + poly_uint64 nscalars = (STMT_SLP_TYPE (stmt_info) 331 + ? vf * GROUP_SIZE (stmt_info) : vf); 332 + possible_npeel_number 333 + = vect_get_num_vectors (nscalars, vectype); Proposed fix: diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index eb8288e7a85..b30a7d8a3bb 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -1823,8 +1823,11 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) { poly_uint64 nscalars = (STMT_SLP_TYPE (stmt_info) ? vf * DR_GROUP_SIZE (stmt_info) : vf); - possible_npeel_number - = vect_get_num_vectors (nscalars, vectype); + if (maybe_lt (nscalars, TYPE_VECTOR_SUBPARTS (vectype))) + possible_npeel_number = 0; + else + possible_npeel_number + = vect_get_num_vectors (nscalars, vectype); /* NPEEL_TMP is 0 when there is no misalignment, but also allow peeling NELEMENTS. */