On Mon, Sep 4, 2017 at 7:28 AM, Tamar Christina <tamar.christ...@arm.com> wrote: >> > vect__5.25_58 = VIEW_CONVERT_EXPR<vector(1) long unsigned >> intD.11>(vect__4.21_65); >> > vect__5.25_57 = VIEW_CONVERT_EXPR<vector(1) long unsigned >> intD.11>(vect__4.22_63); >> > vect__5.25_56 = VIEW_CONVERT_EXPR<vector(1) long unsigned >> intD.11>(vect__4.23_61); >> > vect__5.25_55 = VIEW_CONVERT_EXPR<vector(1) long unsigned >> > intD.11>(vect__4.24_59); >> > >> > I suspect this patch will be quite bad for us performance wise as it >> > thinks it's as cheap to do all our integer operations on the vector side >> > with >> vectors of 1 element. But I'm still waiting for the perf numbers to confirm. >> >> Looks like the backend advertises that it can do POPCOUNT on V1DI. So SLP >> vectorization decides it can vectorize this without unrolling. > > We don't, POPCOUNT is only defined for vector modes V8QI and V16QI, we also > don't define support > For V1DI anywhere in the backend, we do however say we support V1DF, but > removing > That doesn't cause the ICE to go away. > >> Vectorization with V2DI is rejected: >> >> /tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: function is not >> vectorizable. >> /tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: not vectorized: >> relevant stmt not supported: _8 = __builtin_popcountl (_5); ... >> /tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: ***** Re-trying >> analysis with vector size 8 >> >> and that now succeeds (it probably didn't succeed before the patch). > > In the .optimized file, I see it vectorised it to > > vect__5.25_58 = VIEW_CONVERT_EXPR<vector(1) long unsigned > intD.11>(vect__4.21_65); > vect__5.25_57 = VIEW_CONVERT_EXPR<vector(1) long unsigned > intD.11>(vect__4.22_63); > vect__5.25_56 = VIEW_CONVERT_EXPR<vector(1) long unsigned > intD.11>(vect__4.23_61); > vect__5.25_55 = VIEW_CONVERT_EXPR<vector(1) long unsigned > intD.11>(vect__4.24_59); > _54 = POPCOUNT (vect__5.25_58); > _53 = POPCOUNT (vect__5.25_57); > > Which is something we just don't have a pattern for. Before this patch, it > was rejecting "long unsigned int" > With this patch is somehow thinks we support an integer vector of 1 element, > even though 1) we don't have an optab > Defined for this operation for POPCOUNT (or at all in aarch64 as far as I can > tell), and 2) we don't have it in our supported list of vector modes.
Here are the two popcount optab aarch64 has: (define_mode_iterator GPI [SI DI]) (define_expand "popcount<mode>2" [(match_operand:GPI 0 "register_operand") (match_operand:GPI 1 "register_operand")] (define_insn "popcount<mode>2" (define_mode_iterator VB [V8QI V16QI]) [(set (match_operand:VB 0 "register_operand" "=w") (popcount:VB (match_operand:VB 1 "register_operand" "w")))] As you can see we only define popcount optab for SI, DI, V8QI and V16QI. (note SI and DI uses the V8QI and V16QI during the expansion but that is a different story). Maybe somehow the vectorizer is thinking V1DI and DI are interchangeable in some places. Thanks, Andrew > > So I don't quite understand how we end up with this expression. > > Regards, > Tamar > >> >> Richard. >> >> > ________________________________________ >> > From: gcc-patches-ow...@gcc.gnu.org <gcc-patches-ow...@gcc.gnu.org> >> on >> > behalf of Andreas Schwab <sch...@linux-m68k.org> >> > Sent: Saturday, September 2, 2017 10:58 PM >> > To: Jon Beniston >> > Cc: 'Richard Biener'; gcc-patches@gcc.gnu.org >> > Subject: Re: [RFC, vectorizer] Allow single element vector types for >> > vector reduction operations >> > >> > On Aug 30 2017, "Jon Beniston" <j...@beniston.com> wrote: >> > >> > > gcc/ >> > > 2017-08-30 Jon Beniston <j...@beniston.com> >> > > Richard Biener <rguent...@suse.de> >> > > >> > > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c >> > > index cfdb72c..5ebeac2 100644 >> > > --- a/gcc/tree-vect-patterns.c >> > > +++ b/gcc/tree-vect-patterns.c >> > > @@ -4150,7 +4150,7 @@ vect_pattern_recog_1 (vect_recog_func >> *recog_func, >> > > loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); >> > > >> > > if (VECTOR_BOOLEAN_TYPE_P (type_in) >> > > - || VECTOR_MODE_P (TYPE_MODE (type_in))) >> > > + || VECTOR_TYPE_P (type_in)) >> > > { >> > > /* No need to check target support (already checked by the pattern >> > > recognition function). */ diff --git >> > > a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index >> > > 013fb1f..fc62efb 100644 >> > > --- a/gcc/tree-vect-stmts.c >> > > +++ b/gcc/tree-vect-stmts.c >> > > @@ -9098,7 +9098,8 @@ get_vectype_for_scalar_type_and_size (tree >> > > scalar_type, unsigned size) >> > > else >> > > simd_mode = mode_for_vector (inner_mode, size / nbytes); >> > > nunits = GET_MODE_SIZE (simd_mode) / nbytes; >> > > - if (nunits <= 1) >> > > + /* NOTE: nunits == 1 is allowed to support single element vector >> > > types. >> > > */ >> > > + if (nunits < 1) >> > > return NULL_TREE; >> > > >> > > vectype = build_vector_type (scalar_type, nunits); >> > > >> > > >> > > >> > >> > That breaks vect/pr68577.c on aarch64. >> > >> > during RTL pass: expand >> > /opt/gcc/gcc-20170902/gcc/testsuite/gcc.dg/vect/pr68577.c: In function >> 'slp_test': >> > /opt/gcc/gcc-20170902/gcc/testsuite/gcc.dg/vect/pr68577.c:20:12: >> > internal compiler error: in simplify_subreg, at simplify-rtx.c:6050 >> > 0xb55983 simplify_subreg(machine_mode, rtx_def*, machine_mode, >> unsigned int) >> > ../../gcc/simplify-rtx.c:6049 >> > 0xb595f7 simplify_gen_subreg(machine_mode, rtx_def*, machine_mode, >> unsigned int) >> > ../../gcc/simplify-rtx.c:6278 >> > 0x81d277 store_bit_field_1 >> > ../../gcc/expmed.c:798 >> > 0x81d55f store_bit_field(rtx_def*, unsigned long, unsigned long, unsigned >> long, unsigned long, machine_mode, rtx_def*, bool) >> > ../../gcc/expmed.c:1133 >> > 0x840bf7 store_field >> > ../../gcc/expr.c:6950 >> > 0x84792f store_constructor_field >> > ../../gcc/expr.c:6142 >> > 0x83edbf store_constructor >> > ../../gcc/expr.c:6726 >> > 0x840443 expand_constructor >> > ../../gcc/expr.c:8027 >> > 0x82d5bf expand_expr_real_1(tree_node*, rtx_def*, machine_mode, >> expand_modifier, rtx_def**, bool) >> > ../../gcc/expr.c:10133 >> > 0x82eaeb expand_expr_real_1(tree_node*, rtx_def*, machine_mode, >> expand_modifier, rtx_def**, bool) >> > ../../gcc/expr.c:9819 >> > 0x82dadf expand_expr_real_1(tree_node*, rtx_def*, machine_mode, >> expand_modifier, rtx_def**, bool) >> > ../../gcc/expr.c:10942 >> > 0x82eaeb expand_expr_real_1(tree_node*, rtx_def*, machine_mode, >> expand_modifier, rtx_def**, bool) >> > ../../gcc/expr.c:9819 >> > 0x83d197 expand_expr >> > ../../gcc/expr.h:276 >> > 0x83d197 expand_assignment(tree_node*, tree_node*, bool) >> > ../../gcc/expr.c:4971 >> > 0x71e2f3 expand_gimple_stmt_1 >> > ../../gcc/cfgexpand.c:3653 >> > 0x71e2f3 expand_gimple_stmt >> > ../../gcc/cfgexpand.c:3751 >> > 0x721cdb expand_gimple_basic_block >> > ../../gcc/cfgexpand.c:5750 >> > 0x726b07 execute >> > ../../gcc/cfgexpand.c:6357 >> > >> > Andreas. >> > >> > -- >> > Andreas Schwab, sch...@linux-m68k.org >> > GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 >> > 4ED5 "And now for something completely different." >> > >> > >> >> -- >> Richard Biener <rguent...@suse.de> >> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, >> HRB 21284 (AG Nuernberg)