Hi, For the following test-case: typedef float __attribute__((__vector_size__ (16))) F; F foo (F a, F b) { F v = (F) { 9 }; return __builtin_shufflevector (v, v, 1, 0, 1, 2); }
Compiling with -O2 results in following ICE: foo.c: In function ‘foo’: foo.c:6:10: internal compiler error: in decompose, at rtl.h:2314 6 | return __builtin_shufflevector (v, v, 1, 0, 1, 2); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0x7f3185 wi::int_traits<std::pair<rtx_def*, machine_mode> >::decompose(long*, unsigned int, std::pair<rtx_def*, machine_mode> const&) ../../gcc/gcc/rtl.h:2314 0x7f3185 wide_int_ref_storage<false, false>::wide_int_ref_storage<std::pair<rtx_def*, machine_mode> >(std::pair<rtx_def*, machine_mode> const&) ../../gcc/gcc/wide-int.h:1089 0x7f3185 generic_wide_int<wide_int_ref_storage<false, false> >::generic_wide_int<std::pair<rtx_def*, machine_mode> >(std::pair<rtx_def*, machine_mode> const&) ../../gcc/gcc/wide-int.h:847 0x7f3185 poly_int<1u, generic_wide_int<wide_int_ref_storage<false, false> > >::poly_int<std::pair<rtx_def*, machine_mode> >(poly_int_full, std::pair<rtx_def*, machine_mode> const&) ../../gcc/gcc/poly-int.h:467 0x7f3185 poly_int<1u, generic_wide_int<wide_int_ref_storage<false, false> > >::poly_int<std::pair<rtx_def*, machine_mode> >(std::pair<rtx_def*, machine_mode> const&) ../../gcc/gcc/poly-int.h:453 0x7f3185 wi::to_poly_wide(rtx_def const*, machine_mode) ../../gcc/gcc/rtl.h:2383 0x7f3185 rtx_vector_builder::step(rtx_def*, rtx_def*) const ../../gcc/gcc/rtx-vector-builder.h:122 0xfd4e1b vector_builder<rtx_def*, machine_mode, rtx_vector_builder>::elt(unsigned int) const ../../gcc/gcc/vector-builder.h:253 0xfd4d11 rtx_vector_builder::build() ../../gcc/gcc/rtx-vector-builder.cc:73 0xc21d9c const_vector_from_tree ../../gcc/gcc/expr.cc:13487 0xc21d9c expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) ../../gcc/gcc/expr.cc:11059 0xaee682 expand_expr(tree_node*, rtx_def*, machine_mode, expand_modifier) ../../gcc/gcc/expr.h:310 0xaee682 expand_return ../../gcc/gcc/cfgexpand.cc:3809 0xaee682 expand_gimple_stmt_1 ../../gcc/gcc/cfgexpand.cc:3918 0xaee682 expand_gimple_stmt ../../gcc/gcc/cfgexpand.cc:4044 0xaf28f0 expand_gimple_basic_block ../../gcc/gcc/cfgexpand.cc:6100 0xaf4996 execute ../../gcc/gcc/cfgexpand.cc:6835 IIUC, the issue is that fold_vec_perm returns a vector having float element type with res_nelts_per_pattern == 3, and later ICE's when it tries to derive element v[3], not present in the encoding, while trying to build rtx vector in rtx_vector_builder::build(): for (unsigned int i = 0; i < nelts; ++i) RTVEC_ELT (v, i) = elt (i); The attached patch tries to fix this by returning false from valid_mask_for_fold_vec_perm_cst if sel has a stepped sequence and input vector has non-integral element type, so for VLA vectors, it will only build result with dup sequence (nelts_per_pattern < 3) for non-integral element type. For VLS vectors, this will still work for stepped sequence since it will then use the "VLS exception" in fold_vec_perm_cst, and set: res_npattern = res_nelts and res_nelts_per_pattern = 1 and fold the above case to: F foo (F a, F b) { <bb 2> [local count: 1073741824]: return { 0.0, 9.0e+0, 0.0, 0.0 }; } But I am not sure if this is entirely correct, since: tree res = out_elts.build (); will canonicalize the encoding and may result in a stepped sequence (vector_builder::finalize() may reduce npatterns at the cost of increasing nelts_per_pattern) ? PS: This issue is now latent after PR111648 fix, since valid_mask_for_fold_vec_perm_cst with sel = {1, 0, 1, ...} returns false because the corresponding pattern in arg0 is not a natural stepped sequence, and folds correctly using VLS exception. However, I guess the underlying issue of dealing with non-integral element types in fold_vec_perm_cst still remains ? The patch passes bootstrap+test with and without SVE on aarch64-linux-gnu, and on x86_64-linux-gnu. Thanks, Prathamesh
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index 82299bb7f1d..cedfc9616e9 100644 --- a/gcc/fold-const.cc +++ b/gcc/fold-const.cc @@ -10642,6 +10642,11 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, if (sel_nelts_per_pattern < 3) return true; + /* If SEL contains stepped sequence, ensure that we are dealing with + integral vector_cst. */ + if (!INTEGRAL_TYPE_P (TREE_TYPE (TREE_TYPE (arg0)))) + return false; + for (unsigned pattern = 0; pattern < sel_npatterns; pattern++) { poly_uint64 a1 = sel[pattern + sel_npatterns]; diff --git a/gcc/testsuite/gcc.dg/vect/pr111754.c b/gcc/testsuite/gcc.dg/vect/pr111754.c new file mode 100644 index 00000000000..7c1c16875c7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr111754.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +typedef float __attribute__((__vector_size__ (16))) F; + +F foo (F a, F b) +{ + F v = (F) { 9 }; + return __builtin_shufflevector (v, v, 1, 0, 1, 2); +} + +/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized" } } */ +/* { dg-final { scan-tree-dump "return \{ 0.0, 9.0e\\+0, 0.0, 0.0 \}" "optimized" } } */