Re: ICE after folding svld1rq to vec_perm_expr duing forwprop
On Mon, Sep 5, 2022 at 11:27 AM Prathamesh Kulkarni wrote: > > On Mon, 5 Sept 2022 at 14:39, Richard Biener > wrote: > > > > On Mon, Sep 5, 2022 at 10:54 AM Prathamesh Kulkarni > > wrote: > > > > > > On Mon, 29 Aug 2022 at 11:53, Prathamesh Kulkarni > > > wrote: > > > > > > > > On Thu, 18 Aug 2022 at 18:20, Prathamesh Kulkarni > > > > wrote: > > > > > > > > > > On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni > > > > > wrote: > > > > > > > > > > > > On Wed, 17 Aug 2022 at 17:01, Richard Biener > > > > > > wrote: > > > > > > > > > > > > > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford > > > > > > > wrote: > > > > > > > > > > > > > > > > Prathamesh Kulkarni writes: > > > > > > > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener > > > > > > > > > wrote: > > > > > > > > >> > > > > > > > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni > > > > > > > > >> wrote: > > > > > > > > >> > > > > > > > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener > > > > > > > > >> > w>> > > > > > > > > > > >> > > > > > > > > > > >> > > /* If result vector has greater length than input > > > > > > > > >> > > vector, > > > > > > > > >> > > + then allow permuting two vectors as long as: > > > > > > > > >> > > + a) sel.nelts_per_pattern == 1 > > > > > > > > >> > > + b) sel.npatterns == len of input vector. > > > > > > > > >> > > + The intent is to permute input vectors, and > > > > > > > > >> > > + dup the elements in resulting vector to target > > > > > > > > >> > > vector length. */ > > > > > > > > >> > > + > > > > > > > > >> > > + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), > > > > > > > > >> > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0 > > > > > > > > >> > > +{ > > > > > > > > >> > > + nelts = sel.encoding ().npatterns (); > > > > > > > > >> > > + if (sel.encoding ().nelts_per_pattern () != 1 > > > > > > > > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS > > > > > > > > >> > > (TREE_TYPE (arg0) > > > > > > > > >> > > + return NULL_TREE; > > > > > > > > >> > > +} > > > > > > > > >> > > > > > > > > > > >> > > so the only case you add is non-VLA to VLA and there > > > > > > > > >> > > explicitely only the case of a period that's same as the > > > > > > > > >> > > element count in the input vectors. > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer > > > > > > > > >> > > *pp, tree > > > > > > > > >> > > node, int spc, dump_flags_t flags, > > > > > > > > >> > > pp_space (pp); > > > > > > > > >> > > } > > > > > > > > >> > > } > > > > > > > > >> > > + if (VECTOR_TYPE_P (TREE_TYPE (node)) > > > > > > > > >> > > + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE > > > > > > > > >> > > (node)).is_constant ()) > > > > > > > > >> > > + pp_string (pp, ", ... "); > > > > > > > > >> > > pp_right_brace (pp); > > > > > > > > >> > > > > > > > > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are > > > > > > > > >> > > they? > > > > > > > > >> > Well, it got created for the following case after folding: > > > > > > > > >> > svint32_t f2(int a, int b, int c, int d) > > > > > > > > >> > { > > > > > > > > >> > int32x4_t v = {a, b, c, d}; > > > > > > > > >> > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > > > > >> > } > > > > > > > > >> > > > > > > > > > >> > The svld1rq_s32 call gets folded to: > > > > > > > > >> > v = {a, b, c, d} > > > > > > > > >> > lhs = VEC_PERM_EXPR > > > > > > > > >> > > > > > > > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to > > > > > > > > >> > VLA constructor, since elements in v (in_elts) are not > > > > > > > > >> > constant, and > > > > > > > > >> > need_ctor is thus true: > > > > > > > > >> > lhs = {a, b, c, d, ...} > > > > > > > > >> > I added "..." to make it more explicit that it's a VLA > > > > > > > > >> > constructor. > > > > > > > > >> > > > > > > > > >> But I doubt we do anything reasonable with such a beast? Do > > > > > > > > >> we? > > > > > > > > >> I suppose it's like a vec_duplicate if you view it as > > > > > > > > >> V1TImode > > > > > > > > >> but do we actually make sure to do this duplication? > > > > > > > > > I am not sure. As mentioned above, the current code-gen for > > > > > > > > > VLA > > > > > > > > > constructor looks pretty bad. > > > > > > > > > Should we avoid folding VLA constructors for now ? > > > > > > > > > > > > > > > > VLA constructors aren't really a thing. At least, the only VLA > > > > > > > > vector > > > > > > > > you could represent with current CONSTRUCTOR nodes is a > > > > > > > > fixed-length > > > > > > > > sequence at the start of an otherwise zero vector. I'm not sure > > > > > > > > we even use that though (perhaps we do and I've forgotten). > > > > > > > > > > > > > > > > > I guess these are 2 different issues: > > > > > > > > > (a) Resolving ICE with VEC_PERM_EXPR for abo
Re: ICE after folding svld1rq to vec_perm_expr duing forwprop
On Mon, 5 Sept 2022 at 14:39, Richard Biener wrote: > > On Mon, Sep 5, 2022 at 10:54 AM Prathamesh Kulkarni > wrote: > > > > On Mon, 29 Aug 2022 at 11:53, Prathamesh Kulkarni > > wrote: > > > > > > On Thu, 18 Aug 2022 at 18:20, Prathamesh Kulkarni > > > wrote: > > > > > > > > On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni > > > > wrote: > > > > > > > > > > On Wed, 17 Aug 2022 at 17:01, Richard Biener > > > > > wrote: > > > > > > > > > > > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford > > > > > > wrote: > > > > > > > > > > > > > > Prathamesh Kulkarni writes: > > > > > > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener > > > > > > > > wrote: > > > > > > > >> > > > > > > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni > > > > > > > >> wrote: > > > > > > > >> > > > > > > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener > > > > > > > >> > w>> > > > > > > > > > >> > > > > > > > > > >> > > /* If result vector has greater length than input vector, > > > > > > > >> > > + then allow permuting two vectors as long as: > > > > > > > >> > > + a) sel.nelts_per_pattern == 1 > > > > > > > >> > > + b) sel.npatterns == len of input vector. > > > > > > > >> > > + The intent is to permute input vectors, and > > > > > > > >> > > + dup the elements in resulting vector to target > > > > > > > >> > > vector length. */ > > > > > > > >> > > + > > > > > > > >> > > + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), > > > > > > > >> > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0 > > > > > > > >> > > +{ > > > > > > > >> > > + nelts = sel.encoding ().npatterns (); > > > > > > > >> > > + if (sel.encoding ().nelts_per_pattern () != 1 > > > > > > > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS > > > > > > > >> > > (TREE_TYPE (arg0) > > > > > > > >> > > + return NULL_TREE; > > > > > > > >> > > +} > > > > > > > >> > > > > > > > > > >> > > so the only case you add is non-VLA to VLA and there > > > > > > > >> > > explicitely only the case of a period that's same as the > > > > > > > >> > > element count in the input vectors. > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer > > > > > > > >> > > *pp, tree > > > > > > > >> > > node, int spc, dump_flags_t flags, > > > > > > > >> > > pp_space (pp); > > > > > > > >> > > } > > > > > > > >> > > } > > > > > > > >> > > + if (VECTOR_TYPE_P (TREE_TYPE (node)) > > > > > > > >> > > + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE > > > > > > > >> > > (node)).is_constant ()) > > > > > > > >> > > + pp_string (pp, ", ... "); > > > > > > > >> > > pp_right_brace (pp); > > > > > > > >> > > > > > > > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are > > > > > > > >> > > they? > > > > > > > >> > Well, it got created for the following case after folding: > > > > > > > >> > svint32_t f2(int a, int b, int c, int d) > > > > > > > >> > { > > > > > > > >> > int32x4_t v = {a, b, c, d}; > > > > > > > >> > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > > > >> > } > > > > > > > >> > > > > > > > > >> > The svld1rq_s32 call gets folded to: > > > > > > > >> > v = {a, b, c, d} > > > > > > > >> > lhs = VEC_PERM_EXPR > > > > > > > >> > > > > > > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to > > > > > > > >> > VLA constructor, since elements in v (in_elts) are not > > > > > > > >> > constant, and > > > > > > > >> > need_ctor is thus true: > > > > > > > >> > lhs = {a, b, c, d, ...} > > > > > > > >> > I added "..." to make it more explicit that it's a VLA > > > > > > > >> > constructor. > > > > > > > >> > > > > > > > >> But I doubt we do anything reasonable with such a beast? Do > > > > > > > >> we? > > > > > > > >> I suppose it's like a vec_duplicate if you view it as V1TImode > > > > > > > >> but do we actually make sure to do this duplication? > > > > > > > > I am not sure. As mentioned above, the current code-gen for VLA > > > > > > > > constructor looks pretty bad. > > > > > > > > Should we avoid folding VLA constructors for now ? > > > > > > > > > > > > > > VLA constructors aren't really a thing. At least, the only VLA > > > > > > > vector > > > > > > > you could represent with current CONSTRUCTOR nodes is a > > > > > > > fixed-length > > > > > > > sequence at the start of an otherwise zero vector. I'm not sure > > > > > > > we even use that though (perhaps we do and I've forgotten). > > > > > > > > > > > > > > > I guess these are 2 different issues: > > > > > > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests. > > > > > > > > (b) Extending fold_vec_perm to handle vectors with differing > > > > > > > > lengths. > > > > > > > > > > > > > > > > For (a), I think the issue with using: > > > > > > > > res_type = gimple_assign_lhs (stmt) > > > > > > > > in previous patch, was that op2's type will change to match > > > > > > >
Re: ICE after folding svld1rq to vec_perm_expr duing forwprop
On Mon, Sep 5, 2022 at 10:54 AM Prathamesh Kulkarni wrote: > > On Mon, 29 Aug 2022 at 11:53, Prathamesh Kulkarni > wrote: > > > > On Thu, 18 Aug 2022 at 18:20, Prathamesh Kulkarni > > wrote: > > > > > > On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni > > > wrote: > > > > > > > > On Wed, 17 Aug 2022 at 17:01, Richard Biener > > > > wrote: > > > > > > > > > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford > > > > > wrote: > > > > > > > > > > > > Prathamesh Kulkarni writes: > > > > > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener > > > > > > > wrote: > > > > > > >> > > > > > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni > > > > > > >> wrote: > > > > > > >> > > > > > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener > > > > > > >> > w>> > > > > > > > > >> > > > > > > > > >> > > /* If result vector has greater length than input vector, > > > > > > >> > > + then allow permuting two vectors as long as: > > > > > > >> > > + a) sel.nelts_per_pattern == 1 > > > > > > >> > > + b) sel.npatterns == len of input vector. > > > > > > >> > > + The intent is to permute input vectors, and > > > > > > >> > > + dup the elements in resulting vector to target vector > > > > > > >> > > length. */ > > > > > > >> > > + > > > > > > >> > > + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), > > > > > > >> > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0 > > > > > > >> > > +{ > > > > > > >> > > + nelts = sel.encoding ().npatterns (); > > > > > > >> > > + if (sel.encoding ().nelts_per_pattern () != 1 > > > > > > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS > > > > > > >> > > (TREE_TYPE (arg0) > > > > > > >> > > + return NULL_TREE; > > > > > > >> > > +} > > > > > > >> > > > > > > > > >> > > so the only case you add is non-VLA to VLA and there > > > > > > >> > > explicitely only the case of a period that's same as the > > > > > > >> > > element count in the input vectors. > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, > > > > > > >> > > tree > > > > > > >> > > node, int spc, dump_flags_t flags, > > > > > > >> > > pp_space (pp); > > > > > > >> > > } > > > > > > >> > > } > > > > > > >> > > + if (VECTOR_TYPE_P (TREE_TYPE (node)) > > > > > > >> > > + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE > > > > > > >> > > (node)).is_constant ()) > > > > > > >> > > + pp_string (pp, ", ... "); > > > > > > >> > > pp_right_brace (pp); > > > > > > >> > > > > > > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are > > > > > > >> > > they? > > > > > > >> > Well, it got created for the following case after folding: > > > > > > >> > svint32_t f2(int a, int b, int c, int d) > > > > > > >> > { > > > > > > >> > int32x4_t v = {a, b, c, d}; > > > > > > >> > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > > >> > } > > > > > > >> > > > > > > > >> > The svld1rq_s32 call gets folded to: > > > > > > >> > v = {a, b, c, d} > > > > > > >> > lhs = VEC_PERM_EXPR > > > > > > >> > > > > > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to > > > > > > >> > VLA constructor, since elements in v (in_elts) are not > > > > > > >> > constant, and > > > > > > >> > need_ctor is thus true: > > > > > > >> > lhs = {a, b, c, d, ...} > > > > > > >> > I added "..." to make it more explicit that it's a VLA > > > > > > >> > constructor. > > > > > > >> > > > > > > >> But I doubt we do anything reasonable with such a beast? Do we? > > > > > > >> I suppose it's like a vec_duplicate if you view it as V1TImode > > > > > > >> but do we actually make sure to do this duplication? > > > > > > > I am not sure. As mentioned above, the current code-gen for VLA > > > > > > > constructor looks pretty bad. > > > > > > > Should we avoid folding VLA constructors for now ? > > > > > > > > > > > > VLA constructors aren't really a thing. At least, the only VLA > > > > > > vector > > > > > > you could represent with current CONSTRUCTOR nodes is a fixed-length > > > > > > sequence at the start of an otherwise zero vector. I'm not sure > > > > > > we even use that though (perhaps we do and I've forgotten). > > > > > > > > > > > > > I guess these are 2 different issues: > > > > > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests. > > > > > > > (b) Extending fold_vec_perm to handle vectors with differing > > > > > > > lengths. > > > > > > > > > > > > > > For (a), I think the issue with using: > > > > > > > res_type = gimple_assign_lhs (stmt) > > > > > > > in previous patch, was that op2's type will change to match > > > > > > > tgt_units, > > > > > > > if we go thru > > > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch, > > > > > > > and may thus not be same as len(lhs_type) anymore, and hit the > > > > > > > assert > > > > > > > in fold_vec_perm. > > > > > > > > > > > > > > IIUC, for lhs = VEC
Re: ICE after folding svld1rq to vec_perm_expr duing forwprop
On Mon, 29 Aug 2022 at 11:53, Prathamesh Kulkarni wrote: > > On Thu, 18 Aug 2022 at 18:20, Prathamesh Kulkarni > wrote: > > > > On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni > > wrote: > > > > > > On Wed, 17 Aug 2022 at 17:01, Richard Biener > > > wrote: > > > > > > > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford > > > > wrote: > > > > > > > > > > Prathamesh Kulkarni writes: > > > > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener > > > > > > wrote: > > > > > >> > > > > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni > > > > > >> wrote: > > > > > >> > > > > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener > > > > > >> > w>> > > > > > > > >> > > > > > > > >> > > /* If result vector has greater length than input vector, > > > > > >> > > + then allow permuting two vectors as long as: > > > > > >> > > + a) sel.nelts_per_pattern == 1 > > > > > >> > > + b) sel.npatterns == len of input vector. > > > > > >> > > + The intent is to permute input vectors, and > > > > > >> > > + dup the elements in resulting vector to target vector > > > > > >> > > length. */ > > > > > >> > > + > > > > > >> > > + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), > > > > > >> > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0 > > > > > >> > > +{ > > > > > >> > > + nelts = sel.encoding ().npatterns (); > > > > > >> > > + if (sel.encoding ().nelts_per_pattern () != 1 > > > > > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS > > > > > >> > > (TREE_TYPE (arg0) > > > > > >> > > + return NULL_TREE; > > > > > >> > > +} > > > > > >> > > > > > > > >> > > so the only case you add is non-VLA to VLA and there > > > > > >> > > explicitely only the case of a period that's same as the > > > > > >> > > element count in the input vectors. > > > > > >> > > > > > > > >> > > > > > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, > > > > > >> > > tree > > > > > >> > > node, int spc, dump_flags_t flags, > > > > > >> > > pp_space (pp); > > > > > >> > > } > > > > > >> > > } > > > > > >> > > + if (VECTOR_TYPE_P (TREE_TYPE (node)) > > > > > >> > > + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE > > > > > >> > > (node)).is_constant ()) > > > > > >> > > + pp_string (pp, ", ... "); > > > > > >> > > pp_right_brace (pp); > > > > > >> > > > > > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are they? > > > > > >> > Well, it got created for the following case after folding: > > > > > >> > svint32_t f2(int a, int b, int c, int d) > > > > > >> > { > > > > > >> > int32x4_t v = {a, b, c, d}; > > > > > >> > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > >> > } > > > > > >> > > > > > > >> > The svld1rq_s32 call gets folded to: > > > > > >> > v = {a, b, c, d} > > > > > >> > lhs = VEC_PERM_EXPR > > > > > >> > > > > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to > > > > > >> > VLA constructor, since elements in v (in_elts) are not constant, > > > > > >> > and > > > > > >> > need_ctor is thus true: > > > > > >> > lhs = {a, b, c, d, ...} > > > > > >> > I added "..." to make it more explicit that it's a VLA > > > > > >> > constructor. > > > > > >> > > > > > >> But I doubt we do anything reasonable with such a beast? Do we? > > > > > >> I suppose it's like a vec_duplicate if you view it as V1TImode > > > > > >> but do we actually make sure to do this duplication? > > > > > > I am not sure. As mentioned above, the current code-gen for VLA > > > > > > constructor looks pretty bad. > > > > > > Should we avoid folding VLA constructors for now ? > > > > > > > > > > VLA constructors aren't really a thing. At least, the only VLA vector > > > > > you could represent with current CONSTRUCTOR nodes is a fixed-length > > > > > sequence at the start of an otherwise zero vector. I'm not sure > > > > > we even use that though (perhaps we do and I've forgotten). > > > > > > > > > > > I guess these are 2 different issues: > > > > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests. > > > > > > (b) Extending fold_vec_perm to handle vectors with differing > > > > > > lengths. > > > > > > > > > > > > For (a), I think the issue with using: > > > > > > res_type = gimple_assign_lhs (stmt) > > > > > > in previous patch, was that op2's type will change to match > > > > > > tgt_units, > > > > > > if we go thru > > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch, > > > > > > and may thus not be same as len(lhs_type) anymore, and hit the > > > > > > assert > > > > > > in fold_vec_perm. > > > > > > > > > > > > IIUC, for lhs = VEC_PERM_EXPR, we now have the > > > > > > following semantics: > > > > > > (1) Element types for lhs, rhs1 and rhs2 should be the same. > > > > > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2). > > > > > > > > > > Yeah. > > > > > > > > > > > The attached patch changes res_type from TREE_TYPE (arg0) to >
Re: ICE after folding svld1rq to vec_perm_expr duing forwprop
On Thu, 18 Aug 2022 at 18:20, Prathamesh Kulkarni wrote: > > On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni > wrote: > > > > On Wed, 17 Aug 2022 at 17:01, Richard Biener > > wrote: > > > > > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford > > > wrote: > > > > > > > > Prathamesh Kulkarni writes: > > > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener > > > > > wrote: > > > > >> > > > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni > > > > >> wrote: > > > > >> > > > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener > > > > >> > w>> > > > > > > >> > > > > > > >> > > /* If result vector has greater length than input vector, > > > > >> > > + then allow permuting two vectors as long as: > > > > >> > > + a) sel.nelts_per_pattern == 1 > > > > >> > > + b) sel.npatterns == len of input vector. > > > > >> > > + The intent is to permute input vectors, and > > > > >> > > + dup the elements in resulting vector to target vector > > > > >> > > length. */ > > > > >> > > + > > > > >> > > + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), > > > > >> > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0 > > > > >> > > +{ > > > > >> > > + nelts = sel.encoding ().npatterns (); > > > > >> > > + if (sel.encoding ().nelts_per_pattern () != 1 > > > > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE > > > > >> > > (arg0) > > > > >> > > + return NULL_TREE; > > > > >> > > +} > > > > >> > > > > > > >> > > so the only case you add is non-VLA to VLA and there > > > > >> > > explicitely only the case of a period that's same as the > > > > >> > > element count in the input vectors. > > > > >> > > > > > > >> > > > > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree > > > > >> > > node, int spc, dump_flags_t flags, > > > > >> > > pp_space (pp); > > > > >> > > } > > > > >> > > } > > > > >> > > + if (VECTOR_TYPE_P (TREE_TYPE (node)) > > > > >> > > + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE > > > > >> > > (node)).is_constant ()) > > > > >> > > + pp_string (pp, ", ... "); > > > > >> > > pp_right_brace (pp); > > > > >> > > > > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are they? > > > > >> > Well, it got created for the following case after folding: > > > > >> > svint32_t f2(int a, int b, int c, int d) > > > > >> > { > > > > >> > int32x4_t v = {a, b, c, d}; > > > > >> > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > >> > } > > > > >> > > > > > >> > The svld1rq_s32 call gets folded to: > > > > >> > v = {a, b, c, d} > > > > >> > lhs = VEC_PERM_EXPR > > > > >> > > > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to > > > > >> > VLA constructor, since elements in v (in_elts) are not constant, > > > > >> > and > > > > >> > need_ctor is thus true: > > > > >> > lhs = {a, b, c, d, ...} > > > > >> > I added "..." to make it more explicit that it's a VLA constructor. > > > > >> > > > > >> But I doubt we do anything reasonable with such a beast? Do we? > > > > >> I suppose it's like a vec_duplicate if you view it as V1TImode > > > > >> but do we actually make sure to do this duplication? > > > > > I am not sure. As mentioned above, the current code-gen for VLA > > > > > constructor looks pretty bad. > > > > > Should we avoid folding VLA constructors for now ? > > > > > > > > VLA constructors aren't really a thing. At least, the only VLA vector > > > > you could represent with current CONSTRUCTOR nodes is a fixed-length > > > > sequence at the start of an otherwise zero vector. I'm not sure > > > > we even use that though (perhaps we do and I've forgotten). > > > > > > > > > I guess these are 2 different issues: > > > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests. > > > > > (b) Extending fold_vec_perm to handle vectors with differing lengths. > > > > > > > > > > For (a), I think the issue with using: > > > > > res_type = gimple_assign_lhs (stmt) > > > > > in previous patch, was that op2's type will change to match tgt_units, > > > > > if we go thru > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch, > > > > > and may thus not be same as len(lhs_type) anymore, and hit the assert > > > > > in fold_vec_perm. > > > > > > > > > > IIUC, for lhs = VEC_PERM_EXPR, we now have the > > > > > following semantics: > > > > > (1) Element types for lhs, rhs1 and rhs2 should be the same. > > > > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2). > > > > > > > > Yeah. > > > > > > > > > The attached patch changes res_type from TREE_TYPE (arg0) to > > > > > following: > > > > > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)), > > > > > TYPE_VECTOR_SUBPARTS > > > > > (op2)) > > > > > so it has same element type as arg0 (and arg1) and len of op2. > > > > > Does that look reasonable ? > > > > > > > > > > If we need a cast from res_ty
Re: ICE after folding svld1rq to vec_perm_expr duing forwprop
On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni wrote: > > On Wed, 17 Aug 2022 at 17:01, Richard Biener > wrote: > > > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford > > wrote: > > > > > > Prathamesh Kulkarni writes: > > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener > > > > wrote: > > > >> > > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni > > > >> wrote: > > > >> > > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener > > > >> > w>> > > > > > >> > > > > > >> > > /* If result vector has greater length than input vector, > > > >> > > + then allow permuting two vectors as long as: > > > >> > > + a) sel.nelts_per_pattern == 1 > > > >> > > + b) sel.npatterns == len of input vector. > > > >> > > + The intent is to permute input vectors, and > > > >> > > + dup the elements in resulting vector to target vector > > > >> > > length. */ > > > >> > > + > > > >> > > + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), > > > >> > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0 > > > >> > > +{ > > > >> > > + nelts = sel.encoding ().npatterns (); > > > >> > > + if (sel.encoding ().nelts_per_pattern () != 1 > > > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE > > > >> > > (arg0) > > > >> > > + return NULL_TREE; > > > >> > > +} > > > >> > > > > > >> > > so the only case you add is non-VLA to VLA and there > > > >> > > explicitely only the case of a period that's same as the > > > >> > > element count in the input vectors. > > > >> > > > > > >> > > > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree > > > >> > > node, int spc, dump_flags_t flags, > > > >> > > pp_space (pp); > > > >> > > } > > > >> > > } > > > >> > > + if (VECTOR_TYPE_P (TREE_TYPE (node)) > > > >> > > + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE > > > >> > > (node)).is_constant ()) > > > >> > > + pp_string (pp, ", ... "); > > > >> > > pp_right_brace (pp); > > > >> > > > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are they? > > > >> > Well, it got created for the following case after folding: > > > >> > svint32_t f2(int a, int b, int c, int d) > > > >> > { > > > >> > int32x4_t v = {a, b, c, d}; > > > >> > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > >> > } > > > >> > > > > >> > The svld1rq_s32 call gets folded to: > > > >> > v = {a, b, c, d} > > > >> > lhs = VEC_PERM_EXPR > > > >> > > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to > > > >> > VLA constructor, since elements in v (in_elts) are not constant, and > > > >> > need_ctor is thus true: > > > >> > lhs = {a, b, c, d, ...} > > > >> > I added "..." to make it more explicit that it's a VLA constructor. > > > >> > > > >> But I doubt we do anything reasonable with such a beast? Do we? > > > >> I suppose it's like a vec_duplicate if you view it as V1TImode > > > >> but do we actually make sure to do this duplication? > > > > I am not sure. As mentioned above, the current code-gen for VLA > > > > constructor looks pretty bad. > > > > Should we avoid folding VLA constructors for now ? > > > > > > VLA constructors aren't really a thing. At least, the only VLA vector > > > you could represent with current CONSTRUCTOR nodes is a fixed-length > > > sequence at the start of an otherwise zero vector. I'm not sure > > > we even use that though (perhaps we do and I've forgotten). > > > > > > > I guess these are 2 different issues: > > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests. > > > > (b) Extending fold_vec_perm to handle vectors with differing lengths. > > > > > > > > For (a), I think the issue with using: > > > > res_type = gimple_assign_lhs (stmt) > > > > in previous patch, was that op2's type will change to match tgt_units, > > > > if we go thru > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch, > > > > and may thus not be same as len(lhs_type) anymore, and hit the assert > > > > in fold_vec_perm. > > > > > > > > IIUC, for lhs = VEC_PERM_EXPR, we now have the > > > > following semantics: > > > > (1) Element types for lhs, rhs1 and rhs2 should be the same. > > > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2). > > > > > > Yeah. > > > > > > > The attached patch changes res_type from TREE_TYPE (arg0) to following: > > > > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)), > > > > TYPE_VECTOR_SUBPARTS > > > > (op2)) > > > > so it has same element type as arg0 (and arg1) and len of op2. > > > > Does that look reasonable ? > > > > > > > > If we need a cast from res_type to lhs_type, then both would be fixed > > > > width vectors > > > > with len(lhs_type) being a multiple of len(res_type). > > > > IIUC, we don't support casting from VLA vector to/from fixed width > > > > vector, > > > > > > Yes, that's not supported as a cast. If the compiler knows the > > > length
Re: ICE after folding svld1rq to vec_perm_expr duing forwprop
On Wed, 17 Aug 2022 at 17:01, Richard Biener wrote: > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford > wrote: > > > > Prathamesh Kulkarni writes: > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener > > > wrote: > > >> > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni > > >> wrote: > > >> > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener > > >> > w>> > > > > >> > > > > >> > > /* If result vector has greater length than input vector, > > >> > > + then allow permuting two vectors as long as: > > >> > > + a) sel.nelts_per_pattern == 1 > > >> > > + b) sel.npatterns == len of input vector. > > >> > > + The intent is to permute input vectors, and > > >> > > + dup the elements in resulting vector to target vector length. > > >> > > */ > > >> > > + > > >> > > + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), > > >> > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0 > > >> > > +{ > > >> > > + nelts = sel.encoding ().npatterns (); > > >> > > + if (sel.encoding ().nelts_per_pattern () != 1 > > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE > > >> > > (arg0) > > >> > > + return NULL_TREE; > > >> > > +} > > >> > > > > >> > > so the only case you add is non-VLA to VLA and there > > >> > > explicitely only the case of a period that's same as the > > >> > > element count in the input vectors. > > >> > > > > >> > > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree > > >> > > node, int spc, dump_flags_t flags, > > >> > > pp_space (pp); > > >> > > } > > >> > > } > > >> > > + if (VECTOR_TYPE_P (TREE_TYPE (node)) > > >> > > + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant > > >> > > ()) > > >> > > + pp_string (pp, ", ... "); > > >> > > pp_right_brace (pp); > > >> > > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are they? > > >> > Well, it got created for the following case after folding: > > >> > svint32_t f2(int a, int b, int c, int d) > > >> > { > > >> > int32x4_t v = {a, b, c, d}; > > >> > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > >> > } > > >> > > > >> > The svld1rq_s32 call gets folded to: > > >> > v = {a, b, c, d} > > >> > lhs = VEC_PERM_EXPR > > >> > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to > > >> > VLA constructor, since elements in v (in_elts) are not constant, and > > >> > need_ctor is thus true: > > >> > lhs = {a, b, c, d, ...} > > >> > I added "..." to make it more explicit that it's a VLA constructor. > > >> > > >> But I doubt we do anything reasonable with such a beast? Do we? > > >> I suppose it's like a vec_duplicate if you view it as V1TImode > > >> but do we actually make sure to do this duplication? > > > I am not sure. As mentioned above, the current code-gen for VLA > > > constructor looks pretty bad. > > > Should we avoid folding VLA constructors for now ? > > > > VLA constructors aren't really a thing. At least, the only VLA vector > > you could represent with current CONSTRUCTOR nodes is a fixed-length > > sequence at the start of an otherwise zero vector. I'm not sure > > we even use that though (perhaps we do and I've forgotten). > > > > > I guess these are 2 different issues: > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests. > > > (b) Extending fold_vec_perm to handle vectors with differing lengths. > > > > > > For (a), I think the issue with using: > > > res_type = gimple_assign_lhs (stmt) > > > in previous patch, was that op2's type will change to match tgt_units, > > > if we go thru > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch, > > > and may thus not be same as len(lhs_type) anymore, and hit the assert > > > in fold_vec_perm. > > > > > > IIUC, for lhs = VEC_PERM_EXPR, we now have the > > > following semantics: > > > (1) Element types for lhs, rhs1 and rhs2 should be the same. > > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2). > > > > Yeah. > > > > > The attached patch changes res_type from TREE_TYPE (arg0) to following: > > > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)), > > > TYPE_VECTOR_SUBPARTS > > > (op2)) > > > so it has same element type as arg0 (and arg1) and len of op2. > > > Does that look reasonable ? > > > > > > If we need a cast from res_type to lhs_type, then both would be fixed > > > width vectors > > > with len(lhs_type) being a multiple of len(res_type). > > > IIUC, we don't support casting from VLA vector to/from fixed width vector, > > > > Yes, that's not supported as a cast. If the compiler knows the > > length of the "VLA" vector then it's not VLA. If it doesn't > > know the length of the VLA vector then the sizes could be different > > (preventing VIEW_CONVERT_EXPR) and the number of elements could be > > different (preventing pointwise CONVERT_EXPRs). > > > > > or from VLA vector of one type to VL
Re: ICE after folding svld1rq to vec_perm_expr duing forwprop
On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford wrote: > > Prathamesh Kulkarni writes: > > On Tue, 9 Aug 2022 at 18:42, Richard Biener > > wrote: > >> > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni > >> wrote: > >> > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener > >> > w>> > > > >> > > > >> > > /* If result vector has greater length than input vector, > >> > > + then allow permuting two vectors as long as: > >> > > + a) sel.nelts_per_pattern == 1 > >> > > + b) sel.npatterns == len of input vector. > >> > > + The intent is to permute input vectors, and > >> > > + dup the elements in resulting vector to target vector length. */ > >> > > + > >> > > + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), > >> > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0 > >> > > +{ > >> > > + nelts = sel.encoding ().npatterns (); > >> > > + if (sel.encoding ().nelts_per_pattern () != 1 > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE > >> > > (arg0) > >> > > + return NULL_TREE; > >> > > +} > >> > > > >> > > so the only case you add is non-VLA to VLA and there > >> > > explicitely only the case of a period that's same as the > >> > > element count in the input vectors. > >> > > > >> > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree > >> > > node, int spc, dump_flags_t flags, > >> > > pp_space (pp); > >> > > } > >> > > } > >> > > + if (VECTOR_TYPE_P (TREE_TYPE (node)) > >> > > + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ()) > >> > > + pp_string (pp, ", ... "); > >> > > pp_right_brace (pp); > >> > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are they? > >> > Well, it got created for the following case after folding: > >> > svint32_t f2(int a, int b, int c, int d) > >> > { > >> > int32x4_t v = {a, b, c, d}; > >> > return svld1rq_s32 (svptrue_b8 (), &v[0]); > >> > } > >> > > >> > The svld1rq_s32 call gets folded to: > >> > v = {a, b, c, d} > >> > lhs = VEC_PERM_EXPR > >> > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to > >> > VLA constructor, since elements in v (in_elts) are not constant, and > >> > need_ctor is thus true: > >> > lhs = {a, b, c, d, ...} > >> > I added "..." to make it more explicit that it's a VLA constructor. > >> > >> But I doubt we do anything reasonable with such a beast? Do we? > >> I suppose it's like a vec_duplicate if you view it as V1TImode > >> but do we actually make sure to do this duplication? > > I am not sure. As mentioned above, the current code-gen for VLA > > constructor looks pretty bad. > > Should we avoid folding VLA constructors for now ? > > VLA constructors aren't really a thing. At least, the only VLA vector > you could represent with current CONSTRUCTOR nodes is a fixed-length > sequence at the start of an otherwise zero vector. I'm not sure > we even use that though (perhaps we do and I've forgotten). > > > I guess these are 2 different issues: > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests. > > (b) Extending fold_vec_perm to handle vectors with differing lengths. > > > > For (a), I think the issue with using: > > res_type = gimple_assign_lhs (stmt) > > in previous patch, was that op2's type will change to match tgt_units, > > if we go thru > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch, > > and may thus not be same as len(lhs_type) anymore, and hit the assert > > in fold_vec_perm. > > > > IIUC, for lhs = VEC_PERM_EXPR, we now have the > > following semantics: > > (1) Element types for lhs, rhs1 and rhs2 should be the same. > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2). > > Yeah. > > > The attached patch changes res_type from TREE_TYPE (arg0) to following: > > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)), > > TYPE_VECTOR_SUBPARTS (op2)) > > so it has same element type as arg0 (and arg1) and len of op2. > > Does that look reasonable ? > > > > If we need a cast from res_type to lhs_type, then both would be fixed > > width vectors > > with len(lhs_type) being a multiple of len(res_type). > > IIUC, we don't support casting from VLA vector to/from fixed width vector, > > Yes, that's not supported as a cast. If the compiler knows the > length of the "VLA" vector then it's not VLA. If it doesn't > know the length of the VLA vector then the sizes could be different > (preventing VIEW_CONVERT_EXPR) and the number of elements could be > different (preventing pointwise CONVERT_EXPRs). > > > or from VLA vector of one type to VLA vector of other type ? > > That's supported though. They work just like VLS vectors: if the sizes > are the same then we can use VIEW_CONVERT_EXPR, if the number of elements > are the same then we can do pointwise conversions (e.g. element-by-element > extensions, truncations, conversions to float, convers
Re: ICE after folding svld1rq to vec_perm_expr duing forwprop
Prathamesh Kulkarni writes: > On Tue, 9 Aug 2022 at 18:42, Richard Biener > wrote: >> >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni >> wrote: >> > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener >> > w>> > > >> > > >> > > /* If result vector has greater length than input vector, >> > > + then allow permuting two vectors as long as: >> > > + a) sel.nelts_per_pattern == 1 >> > > + b) sel.npatterns == len of input vector. >> > > + The intent is to permute input vectors, and >> > > + dup the elements in resulting vector to target vector length. */ >> > > + >> > > + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), >> > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0 >> > > +{ >> > > + nelts = sel.encoding ().npatterns (); >> > > + if (sel.encoding ().nelts_per_pattern () != 1 >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE >> > > (arg0) >> > > + return NULL_TREE; >> > > +} >> > > >> > > so the only case you add is non-VLA to VLA and there >> > > explicitely only the case of a period that's same as the >> > > element count in the input vectors. >> > > >> > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree >> > > node, int spc, dump_flags_t flags, >> > > pp_space (pp); >> > > } >> > > } >> > > + if (VECTOR_TYPE_P (TREE_TYPE (node)) >> > > + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ()) >> > > + pp_string (pp, ", ... "); >> > > pp_right_brace (pp); >> > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are they? >> > Well, it got created for the following case after folding: >> > svint32_t f2(int a, int b, int c, int d) >> > { >> > int32x4_t v = {a, b, c, d}; >> > return svld1rq_s32 (svptrue_b8 (), &v[0]); >> > } >> > >> > The svld1rq_s32 call gets folded to: >> > v = {a, b, c, d} >> > lhs = VEC_PERM_EXPR >> > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to >> > VLA constructor, since elements in v (in_elts) are not constant, and >> > need_ctor is thus true: >> > lhs = {a, b, c, d, ...} >> > I added "..." to make it more explicit that it's a VLA constructor. >> >> But I doubt we do anything reasonable with such a beast? Do we? >> I suppose it's like a vec_duplicate if you view it as V1TImode >> but do we actually make sure to do this duplication? > I am not sure. As mentioned above, the current code-gen for VLA > constructor looks pretty bad. > Should we avoid folding VLA constructors for now ? VLA constructors aren't really a thing. At least, the only VLA vector you could represent with current CONSTRUCTOR nodes is a fixed-length sequence at the start of an otherwise zero vector. I'm not sure we even use that though (perhaps we do and I've forgotten). > I guess these are 2 different issues: > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests. > (b) Extending fold_vec_perm to handle vectors with differing lengths. > > For (a), I think the issue with using: > res_type = gimple_assign_lhs (stmt) > in previous patch, was that op2's type will change to match tgt_units, > if we go thru > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch, > and may thus not be same as len(lhs_type) anymore, and hit the assert > in fold_vec_perm. > > IIUC, for lhs = VEC_PERM_EXPR, we now have the > following semantics: > (1) Element types for lhs, rhs1 and rhs2 should be the same. > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2). Yeah. > The attached patch changes res_type from TREE_TYPE (arg0) to following: > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)), > TYPE_VECTOR_SUBPARTS (op2)) > so it has same element type as arg0 (and arg1) and len of op2. > Does that look reasonable ? > > If we need a cast from res_type to lhs_type, then both would be fixed > width vectors > with len(lhs_type) being a multiple of len(res_type). > IIUC, we don't support casting from VLA vector to/from fixed width vector, Yes, that's not supported as a cast. If the compiler knows the length of the "VLA" vector then it's not VLA. If it doesn't know the length of the VLA vector then the sizes could be different (preventing VIEW_CONVERT_EXPR) and the number of elements could be different (preventing pointwise CONVERT_EXPRs). > or from VLA vector of one type to VLA vector of other type ? That's supported though. They work just like VLS vectors: if the sizes are the same then we can use VIEW_CONVERT_EXPR, if the number of elements are the same then we can do pointwise conversions (e.g. element-by-element extensions, truncations, conversions to float, conversions to integer, etc). > Currently, if op2 is VLA, and we enter the branch: > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) > then I think it will bail out because op2_units will not be a compile > time constant, > and constant_multiple_p (op2_units, tgt_units, &facto
Re: ICE after folding svld1rq to vec_perm_expr duing forwprop
On Tue, 9 Aug 2022 at 18:42, Richard Biener wrote: > > On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni > wrote: > > > > On Mon, 8 Aug 2022 at 14:27, Richard Biener > > wrote: > > > > > > On Mon, Aug 1, 2022 at 5:17 AM Prathamesh Kulkarni > > > wrote: > > > > > > > > On Thu, 21 Jul 2022 at 12:21, Richard Biener > > > > wrote: > > > > > > > > > > On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni > > > > > wrote: > > > > > > > > > > > > On Mon, 18 Jul 2022 at 11:57, Richard Biener > > > > > > wrote: > > > > > > > > > > > > > > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni > > > > > > > wrote: > > > > > > > > > > > > > > > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > Richard Biener writes: > > > > > > > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni > > > > > > > > > > wrote: > > > > > > > > > >> > > > > > > > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener > > > > > > > > > >> wrote: > > > > > > > > > >> > > > > > > > > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via > > > > > > > > > >> > Gcc-patches > > > > > > > > > >> > wrote: > > > > > > > > > >> > > > > > > > > > > > >> > > Hi Richard, > > > > > > > > > >> > > For the following test: > > > > > > > > > >> > > > > > > > > > > > >> > > svint32_t f2(int a, int b, int c, int d) > > > > > > > > > >> > > { > > > > > > > > > >> > > int32x4_t v = (int32x4_t) {a, b, c, d}; > > > > > > > > > >> > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > > > > > >> > > } > > > > > > > > > >> > > > > > > > > > > > >> > > The compiler emits following ICE with -O3 > > > > > > > > > >> > > -mcpu=generic+sve: > > > > > > > > > >> > > foo.c: In function ‘f2’: > > > > > > > > > >> > > foo.c:4:11: error: non-trivial conversion in > > > > > > > > > >> > > ‘view_convert_expr’ > > > > > > > > > >> > > 4 | svint32_t f2(int a, int b, int c, int d) > > > > > > > > > >> > > | ^~ > > > > > > > > > >> > > svint32_t > > > > > > > > > >> > > __Int32x4_t > > > > > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > > > > > > > >> > > during GIMPLE pass: forwprop > > > > > > > > > >> > > dump file: foo.c.109t.forwprop2 > > > > > > > > > >> > > foo.c:4:11: internal compiler error: verify_gimple > > > > > > > > > >> > > failed > > > > > > > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool) > > > > > > > > > >> > > ../../gcc/gcc/tree-cfg.cc:5568 > > > > > > > > > >> > > 0xe9371f execute_function_todo > > > > > > > > > >> > > ../../gcc/gcc/passes.cc:2091 > > > > > > > > > >> > > 0xe93ccb execute_todo > > > > > > > > > >> > > ../../gcc/gcc/passes.cc:2145 > > > > > > > > > >> > > > > > > > > > > > >> > > This happens because, after folding svld1rq_s32 to > > > > > > > > > >> > > vec_perm_expr, we have: > > > > > > > > > >> > > int32x4_t v; > > > > > > > > > >> > > __Int32x4_t _1; > > > > > > > > > >> > > svint32_t _9; > > > > > > > > > >> > > vector(4) int _11; > > > > > > > > > >> > > > > > > > > > > > >> > >: > > > > > > > > > >> > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; > > > > > > > > > >> > > v_12 = _1; > > > > > > > > > >> > > _11 = v_12; > > > > > > > > > >> > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; > > > > > > > > > >> > > return _9; > > > > > > > > > >> > > > > > > > > > > > >> > > During forwprop, simplify_permutation simplifies > > > > > > > > > >> > > vec_perm_expr to > > > > > > > > > >> > > view_convert_expr, > > > > > > > > > >> > > and the end result becomes: > > > > > > > > > >> > > svint32_t _7; > > > > > > > > > >> > > __Int32x4_t _8; > > > > > > > > > >> > > > > > > > > > > > >> > > ;; basic block 2, loop depth 0 > > > > > > > > > >> > > ;;pred: ENTRY > > > > > > > > > >> > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; > > > > > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > > > > > > > >> > > return _7; > > > > > > > > > >> > > ;;succ: EXIT > > > > > > > > > >> > > > > > > > > > > > >> > > which causes the error duing verify_gimple since > > > > > > > > > >> > > VIEW_CONVERT_EXPR > > > > > > > > > >> > > has incompatible types (svint32_t, int32x4_t). > > > > > > > > > >> > > > > > > > > > > > >> > > The attached patch disables simplification of > > > > > > > > > >> > > VEC_PERM_EXPR > > > > > > > > > >> > > in simplify_permutation, if lhs and rhs have non > > > > > > > > > >> > > compatible types, > > > > > > > > > >> > > which resolves ICE, but am not sure if it's the > > > > > > > > > >> > > correct approach ? > > > > > > > > > >> > > > > > > > > > > >> > It for sure papers over the issue. I think the error > > > > > > > > > >> > happens earlier, > > > > > > > > > >> > the V_C_E should have been built with the type of the > > > > > > > > > >> > VEC_PERM_EXPR > > > > > > > > > >> > which is the type of the LHS. But then you probably run > > > > > > > > > >> > into the > > > > > > > > > >> > di
Re: ICE after folding svld1rq to vec_perm_expr duing forwprop
On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni wrote: > > On Mon, 8 Aug 2022 at 14:27, Richard Biener > wrote: > > > > On Mon, Aug 1, 2022 at 5:17 AM Prathamesh Kulkarni > > wrote: > > > > > > On Thu, 21 Jul 2022 at 12:21, Richard Biener > > > wrote: > > > > > > > > On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni > > > > wrote: > > > > > > > > > > On Mon, 18 Jul 2022 at 11:57, Richard Biener > > > > > wrote: > > > > > > > > > > > > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni > > > > > > wrote: > > > > > > > > > > > > > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford > > > > > > > wrote: > > > > > > > > > > > > > > > > Richard Biener writes: > > > > > > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni > > > > > > > > > wrote: > > > > > > > > >> > > > > > > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener > > > > > > > > >> wrote: > > > > > > > > >> > > > > > > > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via > > > > > > > > >> > Gcc-patches > > > > > > > > >> > wrote: > > > > > > > > >> > > > > > > > > > > >> > > Hi Richard, > > > > > > > > >> > > For the following test: > > > > > > > > >> > > > > > > > > > > >> > > svint32_t f2(int a, int b, int c, int d) > > > > > > > > >> > > { > > > > > > > > >> > > int32x4_t v = (int32x4_t) {a, b, c, d}; > > > > > > > > >> > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > > > > >> > > } > > > > > > > > >> > > > > > > > > > > >> > > The compiler emits following ICE with -O3 > > > > > > > > >> > > -mcpu=generic+sve: > > > > > > > > >> > > foo.c: In function ‘f2’: > > > > > > > > >> > > foo.c:4:11: error: non-trivial conversion in > > > > > > > > >> > > ‘view_convert_expr’ > > > > > > > > >> > > 4 | svint32_t f2(int a, int b, int c, int d) > > > > > > > > >> > > | ^~ > > > > > > > > >> > > svint32_t > > > > > > > > >> > > __Int32x4_t > > > > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > > > > > > >> > > during GIMPLE pass: forwprop > > > > > > > > >> > > dump file: foo.c.109t.forwprop2 > > > > > > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed > > > > > > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool) > > > > > > > > >> > > ../../gcc/gcc/tree-cfg.cc:5568 > > > > > > > > >> > > 0xe9371f execute_function_todo > > > > > > > > >> > > ../../gcc/gcc/passes.cc:2091 > > > > > > > > >> > > 0xe93ccb execute_todo > > > > > > > > >> > > ../../gcc/gcc/passes.cc:2145 > > > > > > > > >> > > > > > > > > > > >> > > This happens because, after folding svld1rq_s32 to > > > > > > > > >> > > vec_perm_expr, we have: > > > > > > > > >> > > int32x4_t v; > > > > > > > > >> > > __Int32x4_t _1; > > > > > > > > >> > > svint32_t _9; > > > > > > > > >> > > vector(4) int _11; > > > > > > > > >> > > > > > > > > > > >> > >: > > > > > > > > >> > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; > > > > > > > > >> > > v_12 = _1; > > > > > > > > >> > > _11 = v_12; > > > > > > > > >> > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; > > > > > > > > >> > > return _9; > > > > > > > > >> > > > > > > > > > > >> > > During forwprop, simplify_permutation simplifies > > > > > > > > >> > > vec_perm_expr to > > > > > > > > >> > > view_convert_expr, > > > > > > > > >> > > and the end result becomes: > > > > > > > > >> > > svint32_t _7; > > > > > > > > >> > > __Int32x4_t _8; > > > > > > > > >> > > > > > > > > > > >> > > ;; basic block 2, loop depth 0 > > > > > > > > >> > > ;;pred: ENTRY > > > > > > > > >> > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; > > > > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > > > > > > >> > > return _7; > > > > > > > > >> > > ;;succ: EXIT > > > > > > > > >> > > > > > > > > > > >> > > which causes the error duing verify_gimple since > > > > > > > > >> > > VIEW_CONVERT_EXPR > > > > > > > > >> > > has incompatible types (svint32_t, int32x4_t). > > > > > > > > >> > > > > > > > > > > >> > > The attached patch disables simplification of > > > > > > > > >> > > VEC_PERM_EXPR > > > > > > > > >> > > in simplify_permutation, if lhs and rhs have non > > > > > > > > >> > > compatible types, > > > > > > > > >> > > which resolves ICE, but am not sure if it's the correct > > > > > > > > >> > > approach ? > > > > > > > > >> > > > > > > > > > >> > It for sure papers over the issue. I think the error > > > > > > > > >> > happens earlier, > > > > > > > > >> > the V_C_E should have been built with the type of the > > > > > > > > >> > VEC_PERM_EXPR > > > > > > > > >> > which is the type of the LHS. But then you probably run > > > > > > > > >> > into the > > > > > > > > >> > different sizes ICE (VLA vs constant size). I think for > > > > > > > > >> > this case you > > > > > > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, > > > > > > > > >> > selecting the "low" part of the VLA vector. > > > > > > > > >> Hi Richard, > > > > > > > > >> Sorry I don't quite
Re: ICE after folding svld1rq to vec_perm_expr duing forwprop
On Mon, 8 Aug 2022 at 14:27, Richard Biener wrote: > > On Mon, Aug 1, 2022 at 5:17 AM Prathamesh Kulkarni > wrote: > > > > On Thu, 21 Jul 2022 at 12:21, Richard Biener > > wrote: > > > > > > On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni > > > wrote: > > > > > > > > On Mon, 18 Jul 2022 at 11:57, Richard Biener > > > > wrote: > > > > > > > > > > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni > > > > > wrote: > > > > > > > > > > > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford > > > > > > wrote: > > > > > > > > > > > > > > Richard Biener writes: > > > > > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni > > > > > > > > wrote: > > > > > > > >> > > > > > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener > > > > > > > >> wrote: > > > > > > > >> > > > > > > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via > > > > > > > >> > Gcc-patches > > > > > > > >> > wrote: > > > > > > > >> > > > > > > > > > >> > > Hi Richard, > > > > > > > >> > > For the following test: > > > > > > > >> > > > > > > > > > >> > > svint32_t f2(int a, int b, int c, int d) > > > > > > > >> > > { > > > > > > > >> > > int32x4_t v = (int32x4_t) {a, b, c, d}; > > > > > > > >> > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > > > >> > > } > > > > > > > >> > > > > > > > > > >> > > The compiler emits following ICE with -O3 > > > > > > > >> > > -mcpu=generic+sve: > > > > > > > >> > > foo.c: In function ‘f2’: > > > > > > > >> > > foo.c:4:11: error: non-trivial conversion in > > > > > > > >> > > ‘view_convert_expr’ > > > > > > > >> > > 4 | svint32_t f2(int a, int b, int c, int d) > > > > > > > >> > > | ^~ > > > > > > > >> > > svint32_t > > > > > > > >> > > __Int32x4_t > > > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > > > > > >> > > during GIMPLE pass: forwprop > > > > > > > >> > > dump file: foo.c.109t.forwprop2 > > > > > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed > > > > > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool) > > > > > > > >> > > ../../gcc/gcc/tree-cfg.cc:5568 > > > > > > > >> > > 0xe9371f execute_function_todo > > > > > > > >> > > ../../gcc/gcc/passes.cc:2091 > > > > > > > >> > > 0xe93ccb execute_todo > > > > > > > >> > > ../../gcc/gcc/passes.cc:2145 > > > > > > > >> > > > > > > > > > >> > > This happens because, after folding svld1rq_s32 to > > > > > > > >> > > vec_perm_expr, we have: > > > > > > > >> > > int32x4_t v; > > > > > > > >> > > __Int32x4_t _1; > > > > > > > >> > > svint32_t _9; > > > > > > > >> > > vector(4) int _11; > > > > > > > >> > > > > > > > > > >> > >: > > > > > > > >> > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; > > > > > > > >> > > v_12 = _1; > > > > > > > >> > > _11 = v_12; > > > > > > > >> > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; > > > > > > > >> > > return _9; > > > > > > > >> > > > > > > > > > >> > > During forwprop, simplify_permutation simplifies > > > > > > > >> > > vec_perm_expr to > > > > > > > >> > > view_convert_expr, > > > > > > > >> > > and the end result becomes: > > > > > > > >> > > svint32_t _7; > > > > > > > >> > > __Int32x4_t _8; > > > > > > > >> > > > > > > > > > >> > > ;; basic block 2, loop depth 0 > > > > > > > >> > > ;;pred: ENTRY > > > > > > > >> > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; > > > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > > > > > >> > > return _7; > > > > > > > >> > > ;;succ: EXIT > > > > > > > >> > > > > > > > > > >> > > which causes the error duing verify_gimple since > > > > > > > >> > > VIEW_CONVERT_EXPR > > > > > > > >> > > has incompatible types (svint32_t, int32x4_t). > > > > > > > >> > > > > > > > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR > > > > > > > >> > > in simplify_permutation, if lhs and rhs have non > > > > > > > >> > > compatible types, > > > > > > > >> > > which resolves ICE, but am not sure if it's the correct > > > > > > > >> > > approach ? > > > > > > > >> > > > > > > > > >> > It for sure papers over the issue. I think the error > > > > > > > >> > happens earlier, > > > > > > > >> > the V_C_E should have been built with the type of the > > > > > > > >> > VEC_PERM_EXPR > > > > > > > >> > which is the type of the LHS. But then you probably run > > > > > > > >> > into the > > > > > > > >> > different sizes ICE (VLA vs constant size). I think for > > > > > > > >> > this case you > > > > > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, > > > > > > > >> > selecting the "low" part of the VLA vector. > > > > > > > >> Hi Richard, > > > > > > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR > > > > > > > >> to > > > > > > > >> represent dup operation > > > > > > > >> from fixed width to VLA vector. I am not sure how folding it to > > > > > > > >> BIT_FIELD_REF will work. > > > > > > > >> Could you please elaborate ? > > > > > > > >> > > > > > > > >> A
Re: ICE after folding svld1rq to vec_perm_expr duing forwprop
On Mon, Aug 1, 2022 at 5:17 AM Prathamesh Kulkarni wrote: > > On Thu, 21 Jul 2022 at 12:21, Richard Biener > wrote: > > > > On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni > > wrote: > > > > > > On Mon, 18 Jul 2022 at 11:57, Richard Biener > > > wrote: > > > > > > > > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni > > > > wrote: > > > > > > > > > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford > > > > > wrote: > > > > > > > > > > > > Richard Biener writes: > > > > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni > > > > > > > wrote: > > > > > > >> > > > > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener > > > > > > >> wrote: > > > > > > >> > > > > > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via > > > > > > >> > Gcc-patches > > > > > > >> > wrote: > > > > > > >> > > > > > > > > >> > > Hi Richard, > > > > > > >> > > For the following test: > > > > > > >> > > > > > > > > >> > > svint32_t f2(int a, int b, int c, int d) > > > > > > >> > > { > > > > > > >> > > int32x4_t v = (int32x4_t) {a, b, c, d}; > > > > > > >> > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > > >> > > } > > > > > > >> > > > > > > > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve: > > > > > > >> > > foo.c: In function ‘f2’: > > > > > > >> > > foo.c:4:11: error: non-trivial conversion in > > > > > > >> > > ‘view_convert_expr’ > > > > > > >> > > 4 | svint32_t f2(int a, int b, int c, int d) > > > > > > >> > > | ^~ > > > > > > >> > > svint32_t > > > > > > >> > > __Int32x4_t > > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > > > > >> > > during GIMPLE pass: forwprop > > > > > > >> > > dump file: foo.c.109t.forwprop2 > > > > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed > > > > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool) > > > > > > >> > > ../../gcc/gcc/tree-cfg.cc:5568 > > > > > > >> > > 0xe9371f execute_function_todo > > > > > > >> > > ../../gcc/gcc/passes.cc:2091 > > > > > > >> > > 0xe93ccb execute_todo > > > > > > >> > > ../../gcc/gcc/passes.cc:2145 > > > > > > >> > > > > > > > > >> > > This happens because, after folding svld1rq_s32 to > > > > > > >> > > vec_perm_expr, we have: > > > > > > >> > > int32x4_t v; > > > > > > >> > > __Int32x4_t _1; > > > > > > >> > > svint32_t _9; > > > > > > >> > > vector(4) int _11; > > > > > > >> > > > > > > > > >> > >: > > > > > > >> > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; > > > > > > >> > > v_12 = _1; > > > > > > >> > > _11 = v_12; > > > > > > >> > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; > > > > > > >> > > return _9; > > > > > > >> > > > > > > > > >> > > During forwprop, simplify_permutation simplifies > > > > > > >> > > vec_perm_expr to > > > > > > >> > > view_convert_expr, > > > > > > >> > > and the end result becomes: > > > > > > >> > > svint32_t _7; > > > > > > >> > > __Int32x4_t _8; > > > > > > >> > > > > > > > > >> > > ;; basic block 2, loop depth 0 > > > > > > >> > > ;;pred: ENTRY > > > > > > >> > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; > > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > > > > >> > > return _7; > > > > > > >> > > ;;succ: EXIT > > > > > > >> > > > > > > > > >> > > which causes the error duing verify_gimple since > > > > > > >> > > VIEW_CONVERT_EXPR > > > > > > >> > > has incompatible types (svint32_t, int32x4_t). > > > > > > >> > > > > > > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR > > > > > > >> > > in simplify_permutation, if lhs and rhs have non compatible > > > > > > >> > > types, > > > > > > >> > > which resolves ICE, but am not sure if it's the correct > > > > > > >> > > approach ? > > > > > > >> > > > > > > > >> > It for sure papers over the issue. I think the error happens > > > > > > >> > earlier, > > > > > > >> > the V_C_E should have been built with the type of the > > > > > > >> > VEC_PERM_EXPR > > > > > > >> > which is the type of the LHS. But then you probably run into > > > > > > >> > the > > > > > > >> > different sizes ICE (VLA vs constant size). I think for this > > > > > > >> > case you > > > > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, > > > > > > >> > selecting the "low" part of the VLA vector. > > > > > > >> Hi Richard, > > > > > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to > > > > > > >> represent dup operation > > > > > > >> from fixed width to VLA vector. I am not sure how folding it to > > > > > > >> BIT_FIELD_REF will work. > > > > > > >> Could you please elaborate ? > > > > > > >> > > > > > > >> Also, the issue doesn't seem restricted to this case. > > > > > > >> The following test case also ICE's during forwprop: > > > > > > >> svint32_t foo() > > > > > > >> { > > > > > > >> int32x4_t v = (int32x4_t) {1, 2, 3, 4}; > > > > > > >> svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > > >> return v2;
Re: ICE after folding svld1rq to vec_perm_expr duing forwprop
On Thu, 21 Jul 2022 at 12:21, Richard Biener wrote: > > On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni > wrote: > > > > On Mon, 18 Jul 2022 at 11:57, Richard Biener > > wrote: > > > > > > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni > > > wrote: > > > > > > > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford > > > > wrote: > > > > > > > > > > Richard Biener writes: > > > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni > > > > > > wrote: > > > > > >> > > > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener > > > > > >> wrote: > > > > > >> > > > > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via > > > > > >> > Gcc-patches > > > > > >> > wrote: > > > > > >> > > > > > > > >> > > Hi Richard, > > > > > >> > > For the following test: > > > > > >> > > > > > > > >> > > svint32_t f2(int a, int b, int c, int d) > > > > > >> > > { > > > > > >> > > int32x4_t v = (int32x4_t) {a, b, c, d}; > > > > > >> > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > >> > > } > > > > > >> > > > > > > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve: > > > > > >> > > foo.c: In function ‘f2’: > > > > > >> > > foo.c:4:11: error: non-trivial conversion in > > > > > >> > > ‘view_convert_expr’ > > > > > >> > > 4 | svint32_t f2(int a, int b, int c, int d) > > > > > >> > > | ^~ > > > > > >> > > svint32_t > > > > > >> > > __Int32x4_t > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > > > >> > > during GIMPLE pass: forwprop > > > > > >> > > dump file: foo.c.109t.forwprop2 > > > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed > > > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool) > > > > > >> > > ../../gcc/gcc/tree-cfg.cc:5568 > > > > > >> > > 0xe9371f execute_function_todo > > > > > >> > > ../../gcc/gcc/passes.cc:2091 > > > > > >> > > 0xe93ccb execute_todo > > > > > >> > > ../../gcc/gcc/passes.cc:2145 > > > > > >> > > > > > > > >> > > This happens because, after folding svld1rq_s32 to > > > > > >> > > vec_perm_expr, we have: > > > > > >> > > int32x4_t v; > > > > > >> > > __Int32x4_t _1; > > > > > >> > > svint32_t _9; > > > > > >> > > vector(4) int _11; > > > > > >> > > > > > > > >> > >: > > > > > >> > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; > > > > > >> > > v_12 = _1; > > > > > >> > > _11 = v_12; > > > > > >> > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; > > > > > >> > > return _9; > > > > > >> > > > > > > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr > > > > > >> > > to > > > > > >> > > view_convert_expr, > > > > > >> > > and the end result becomes: > > > > > >> > > svint32_t _7; > > > > > >> > > __Int32x4_t _8; > > > > > >> > > > > > > > >> > > ;; basic block 2, loop depth 0 > > > > > >> > > ;;pred: ENTRY > > > > > >> > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > > > >> > > return _7; > > > > > >> > > ;;succ: EXIT > > > > > >> > > > > > > > >> > > which causes the error duing verify_gimple since > > > > > >> > > VIEW_CONVERT_EXPR > > > > > >> > > has incompatible types (svint32_t, int32x4_t). > > > > > >> > > > > > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR > > > > > >> > > in simplify_permutation, if lhs and rhs have non compatible > > > > > >> > > types, > > > > > >> > > which resolves ICE, but am not sure if it's the correct > > > > > >> > > approach ? > > > > > >> > > > > > > >> > It for sure papers over the issue. I think the error happens > > > > > >> > earlier, > > > > > >> > the V_C_E should have been built with the type of the > > > > > >> > VEC_PERM_EXPR > > > > > >> > which is the type of the LHS. But then you probably run into the > > > > > >> > different sizes ICE (VLA vs constant size). I think for this > > > > > >> > case you > > > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, > > > > > >> > selecting the "low" part of the VLA vector. > > > > > >> Hi Richard, > > > > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to > > > > > >> represent dup operation > > > > > >> from fixed width to VLA vector. I am not sure how folding it to > > > > > >> BIT_FIELD_REF will work. > > > > > >> Could you please elaborate ? > > > > > >> > > > > > >> Also, the issue doesn't seem restricted to this case. > > > > > >> The following test case also ICE's during forwprop: > > > > > >> svint32_t foo() > > > > > >> { > > > > > >> int32x4_t v = (int32x4_t) {1, 2, 3, 4}; > > > > > >> svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > >> return v2; > > > > > >> } > > > > > >> > > > > > >> foo2.c: In function ‘foo’: > > > > > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’ > > > > > >> 9 | } > > > > > >> | ^ > > > > > >> svint32_t > > > > > >> int32x4_t > > > > > >> v2_4 = { 1, 2, 3, 4 }; > > > > > >> > > > > > >> because simplify
Re: ICE after folding svld1rq to vec_perm_expr duing forwprop
On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni wrote: > > On Mon, 18 Jul 2022 at 11:57, Richard Biener > wrote: > > > > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni > > wrote: > > > > > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford > > > wrote: > > > > > > > > Richard Biener writes: > > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni > > > > > wrote: > > > > >> > > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener > > > > >> wrote: > > > > >> > > > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches > > > > >> > wrote: > > > > >> > > > > > > >> > > Hi Richard, > > > > >> > > For the following test: > > > > >> > > > > > > >> > > svint32_t f2(int a, int b, int c, int d) > > > > >> > > { > > > > >> > > int32x4_t v = (int32x4_t) {a, b, c, d}; > > > > >> > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > >> > > } > > > > >> > > > > > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve: > > > > >> > > foo.c: In function ‘f2’: > > > > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’ > > > > >> > > 4 | svint32_t f2(int a, int b, int c, int d) > > > > >> > > | ^~ > > > > >> > > svint32_t > > > > >> > > __Int32x4_t > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > > >> > > during GIMPLE pass: forwprop > > > > >> > > dump file: foo.c.109t.forwprop2 > > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed > > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool) > > > > >> > > ../../gcc/gcc/tree-cfg.cc:5568 > > > > >> > > 0xe9371f execute_function_todo > > > > >> > > ../../gcc/gcc/passes.cc:2091 > > > > >> > > 0xe93ccb execute_todo > > > > >> > > ../../gcc/gcc/passes.cc:2145 > > > > >> > > > > > > >> > > This happens because, after folding svld1rq_s32 to > > > > >> > > vec_perm_expr, we have: > > > > >> > > int32x4_t v; > > > > >> > > __Int32x4_t _1; > > > > >> > > svint32_t _9; > > > > >> > > vector(4) int _11; > > > > >> > > > > > > >> > >: > > > > >> > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; > > > > >> > > v_12 = _1; > > > > >> > > _11 = v_12; > > > > >> > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; > > > > >> > > return _9; > > > > >> > > > > > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to > > > > >> > > view_convert_expr, > > > > >> > > and the end result becomes: > > > > >> > > svint32_t _7; > > > > >> > > __Int32x4_t _8; > > > > >> > > > > > > >> > > ;; basic block 2, loop depth 0 > > > > >> > > ;;pred: ENTRY > > > > >> > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > > >> > > return _7; > > > > >> > > ;;succ: EXIT > > > > >> > > > > > > >> > > which causes the error duing verify_gimple since > > > > >> > > VIEW_CONVERT_EXPR > > > > >> > > has incompatible types (svint32_t, int32x4_t). > > > > >> > > > > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR > > > > >> > > in simplify_permutation, if lhs and rhs have non compatible > > > > >> > > types, > > > > >> > > which resolves ICE, but am not sure if it's the correct approach > > > > >> > > ? > > > > >> > > > > > >> > It for sure papers over the issue. I think the error happens > > > > >> > earlier, > > > > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR > > > > >> > which is the type of the LHS. But then you probably run into the > > > > >> > different sizes ICE (VLA vs constant size). I think for this case > > > > >> > you > > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, > > > > >> > selecting the "low" part of the VLA vector. > > > > >> Hi Richard, > > > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to > > > > >> represent dup operation > > > > >> from fixed width to VLA vector. I am not sure how folding it to > > > > >> BIT_FIELD_REF will work. > > > > >> Could you please elaborate ? > > > > >> > > > > >> Also, the issue doesn't seem restricted to this case. > > > > >> The following test case also ICE's during forwprop: > > > > >> svint32_t foo() > > > > >> { > > > > >> int32x4_t v = (int32x4_t) {1, 2, 3, 4}; > > > > >> svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > >> return v2; > > > > >> } > > > > >> > > > > >> foo2.c: In function ‘foo’: > > > > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’ > > > > >> 9 | } > > > > >> | ^ > > > > >> svint32_t > > > > >> int32x4_t > > > > >> v2_4 = { 1, 2, 3, 4 }; > > > > >> > > > > >> because simplify_permutation folds > > > > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} > > > > > >> into: > > > > >> vector_cst {1, 2, 3, 4} > > > > >> > > > > >> and it complains during verify_gimple_assign_single because we don't > > > > >> support assignment of vector_cst to VLA vector. > > > > >> > > > > >> I guess the issue really is that currently,
Re: ICE after folding svld1rq to vec_perm_expr duing forwprop
On Mon, 18 Jul 2022 at 11:57, Richard Biener wrote: > > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni > wrote: > > > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford > > wrote: > > > > > > Richard Biener writes: > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni > > > > wrote: > > > >> > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener > > > >> wrote: > > > >> > > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches > > > >> > wrote: > > > >> > > > > > >> > > Hi Richard, > > > >> > > For the following test: > > > >> > > > > > >> > > svint32_t f2(int a, int b, int c, int d) > > > >> > > { > > > >> > > int32x4_t v = (int32x4_t) {a, b, c, d}; > > > >> > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > >> > > } > > > >> > > > > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve: > > > >> > > foo.c: In function ‘f2’: > > > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’ > > > >> > > 4 | svint32_t f2(int a, int b, int c, int d) > > > >> > > | ^~ > > > >> > > svint32_t > > > >> > > __Int32x4_t > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > >> > > during GIMPLE pass: forwprop > > > >> > > dump file: foo.c.109t.forwprop2 > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool) > > > >> > > ../../gcc/gcc/tree-cfg.cc:5568 > > > >> > > 0xe9371f execute_function_todo > > > >> > > ../../gcc/gcc/passes.cc:2091 > > > >> > > 0xe93ccb execute_todo > > > >> > > ../../gcc/gcc/passes.cc:2145 > > > >> > > > > > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, > > > >> > > we have: > > > >> > > int32x4_t v; > > > >> > > __Int32x4_t _1; > > > >> > > svint32_t _9; > > > >> > > vector(4) int _11; > > > >> > > > > > >> > >: > > > >> > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; > > > >> > > v_12 = _1; > > > >> > > _11 = v_12; > > > >> > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; > > > >> > > return _9; > > > >> > > > > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to > > > >> > > view_convert_expr, > > > >> > > and the end result becomes: > > > >> > > svint32_t _7; > > > >> > > __Int32x4_t _8; > > > >> > > > > > >> > > ;; basic block 2, loop depth 0 > > > >> > > ;;pred: ENTRY > > > >> > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > >> > > return _7; > > > >> > > ;;succ: EXIT > > > >> > > > > > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR > > > >> > > has incompatible types (svint32_t, int32x4_t). > > > >> > > > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR > > > >> > > in simplify_permutation, if lhs and rhs have non compatible types, > > > >> > > which resolves ICE, but am not sure if it's the correct approach ? > > > >> > > > > >> > It for sure papers over the issue. I think the error happens > > > >> > earlier, > > > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR > > > >> > which is the type of the LHS. But then you probably run into the > > > >> > different sizes ICE (VLA vs constant size). I think for this case > > > >> > you > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, > > > >> > selecting the "low" part of the VLA vector. > > > >> Hi Richard, > > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to > > > >> represent dup operation > > > >> from fixed width to VLA vector. I am not sure how folding it to > > > >> BIT_FIELD_REF will work. > > > >> Could you please elaborate ? > > > >> > > > >> Also, the issue doesn't seem restricted to this case. > > > >> The following test case also ICE's during forwprop: > > > >> svint32_t foo() > > > >> { > > > >> int32x4_t v = (int32x4_t) {1, 2, 3, 4}; > > > >> svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]); > > > >> return v2; > > > >> } > > > >> > > > >> foo2.c: In function ‘foo’: > > > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’ > > > >> 9 | } > > > >> | ^ > > > >> svint32_t > > > >> int32x4_t > > > >> v2_4 = { 1, 2, 3, 4 }; > > > >> > > > >> because simplify_permutation folds > > > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} > > > > >> into: > > > >> vector_cst {1, 2, 3, 4} > > > >> > > > >> and it complains during verify_gimple_assign_single because we don't > > > >> support assignment of vector_cst to VLA vector. > > > >> > > > >> I guess the issue really is that currently, only VEC_PERM_EXPR > > > >> supports lhs and rhs > > > >> to have vector types with differing lengths, and simplifying it to > > > >> other tree codes, like above, > > > >> will result in type errors ? > > > > > > > > That might be the case - Richard should know. > > > > > > I don't see anything particularly special about VEC_PERM_EXPR here, > > > or
Re: ICE after folding svld1rq to vec_perm_expr duing forwprop
On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni wrote: > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford > wrote: > > > > Richard Biener writes: > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni > > > wrote: > > >> > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener > > >> wrote: > > >> > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches > > >> > wrote: > > >> > > > > >> > > Hi Richard, > > >> > > For the following test: > > >> > > > > >> > > svint32_t f2(int a, int b, int c, int d) > > >> > > { > > >> > > int32x4_t v = (int32x4_t) {a, b, c, d}; > > >> > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > >> > > } > > >> > > > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve: > > >> > > foo.c: In function ‘f2’: > > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’ > > >> > > 4 | svint32_t f2(int a, int b, int c, int d) > > >> > > | ^~ > > >> > > svint32_t > > >> > > __Int32x4_t > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > >> > > during GIMPLE pass: forwprop > > >> > > dump file: foo.c.109t.forwprop2 > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool) > > >> > > ../../gcc/gcc/tree-cfg.cc:5568 > > >> > > 0xe9371f execute_function_todo > > >> > > ../../gcc/gcc/passes.cc:2091 > > >> > > 0xe93ccb execute_todo > > >> > > ../../gcc/gcc/passes.cc:2145 > > >> > > > > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we > > >> > > have: > > >> > > int32x4_t v; > > >> > > __Int32x4_t _1; > > >> > > svint32_t _9; > > >> > > vector(4) int _11; > > >> > > > > >> > >: > > >> > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; > > >> > > v_12 = _1; > > >> > > _11 = v_12; > > >> > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; > > >> > > return _9; > > >> > > > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to > > >> > > view_convert_expr, > > >> > > and the end result becomes: > > >> > > svint32_t _7; > > >> > > __Int32x4_t _8; > > >> > > > > >> > > ;; basic block 2, loop depth 0 > > >> > > ;;pred: ENTRY > > >> > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > >> > > return _7; > > >> > > ;;succ: EXIT > > >> > > > > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR > > >> > > has incompatible types (svint32_t, int32x4_t). > > >> > > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR > > >> > > in simplify_permutation, if lhs and rhs have non compatible types, > > >> > > which resolves ICE, but am not sure if it's the correct approach ? > > >> > > > >> > It for sure papers over the issue. I think the error happens earlier, > > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR > > >> > which is the type of the LHS. But then you probably run into the > > >> > different sizes ICE (VLA vs constant size). I think for this case you > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, > > >> > selecting the "low" part of the VLA vector. > > >> Hi Richard, > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to > > >> represent dup operation > > >> from fixed width to VLA vector. I am not sure how folding it to > > >> BIT_FIELD_REF will work. > > >> Could you please elaborate ? > > >> > > >> Also, the issue doesn't seem restricted to this case. > > >> The following test case also ICE's during forwprop: > > >> svint32_t foo() > > >> { > > >> int32x4_t v = (int32x4_t) {1, 2, 3, 4}; > > >> svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]); > > >> return v2; > > >> } > > >> > > >> foo2.c: In function ‘foo’: > > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’ > > >> 9 | } > > >> | ^ > > >> svint32_t > > >> int32x4_t > > >> v2_4 = { 1, 2, 3, 4 }; > > >> > > >> because simplify_permutation folds > > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} > > > >> into: > > >> vector_cst {1, 2, 3, 4} > > >> > > >> and it complains during verify_gimple_assign_single because we don't > > >> support assignment of vector_cst to VLA vector. > > >> > > >> I guess the issue really is that currently, only VEC_PERM_EXPR > > >> supports lhs and rhs > > >> to have vector types with differing lengths, and simplifying it to > > >> other tree codes, like above, > > >> will result in type errors ? > > > > > > That might be the case - Richard should know. > > > > I don't see anything particularly special about VEC_PERM_EXPR here, > > or about the VLA vs. VLS thing. We would have the same issue trying > > to build a 128-bit vector from 2 64-bit vectors. And there are other > > tree codes whose input types are/can be different from their output > > types. > > > > So it just seems like a normal type correctness issue: a VEC_PERM_EXPR > > of type T needs to be r
Re: ICE after folding svld1rq to vec_perm_expr duing forwprop
On Thu, 14 Jul 2022 at 17:22, Richard Sandiford wrote: > > Richard Biener writes: > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni > > wrote: > >> > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener > >> wrote: > >> > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches > >> > wrote: > >> > > > >> > > Hi Richard, > >> > > For the following test: > >> > > > >> > > svint32_t f2(int a, int b, int c, int d) > >> > > { > >> > > int32x4_t v = (int32x4_t) {a, b, c, d}; > >> > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > >> > > } > >> > > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve: > >> > > foo.c: In function ‘f2’: > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’ > >> > > 4 | svint32_t f2(int a, int b, int c, int d) > >> > > | ^~ > >> > > svint32_t > >> > > __Int32x4_t > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > >> > > during GIMPLE pass: forwprop > >> > > dump file: foo.c.109t.forwprop2 > >> > > foo.c:4:11: internal compiler error: verify_gimple failed > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool) > >> > > ../../gcc/gcc/tree-cfg.cc:5568 > >> > > 0xe9371f execute_function_todo > >> > > ../../gcc/gcc/passes.cc:2091 > >> > > 0xe93ccb execute_todo > >> > > ../../gcc/gcc/passes.cc:2145 > >> > > > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we > >> > > have: > >> > > int32x4_t v; > >> > > __Int32x4_t _1; > >> > > svint32_t _9; > >> > > vector(4) int _11; > >> > > > >> > >: > >> > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; > >> > > v_12 = _1; > >> > > _11 = v_12; > >> > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; > >> > > return _9; > >> > > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to > >> > > view_convert_expr, > >> > > and the end result becomes: > >> > > svint32_t _7; > >> > > __Int32x4_t _8; > >> > > > >> > > ;; basic block 2, loop depth 0 > >> > > ;;pred: ENTRY > >> > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > >> > > return _7; > >> > > ;;succ: EXIT > >> > > > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR > >> > > has incompatible types (svint32_t, int32x4_t). > >> > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR > >> > > in simplify_permutation, if lhs and rhs have non compatible types, > >> > > which resolves ICE, but am not sure if it's the correct approach ? > >> > > >> > It for sure papers over the issue. I think the error happens earlier, > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR > >> > which is the type of the LHS. But then you probably run into the > >> > different sizes ICE (VLA vs constant size). I think for this case you > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, > >> > selecting the "low" part of the VLA vector. > >> Hi Richard, > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to > >> represent dup operation > >> from fixed width to VLA vector. I am not sure how folding it to > >> BIT_FIELD_REF will work. > >> Could you please elaborate ? > >> > >> Also, the issue doesn't seem restricted to this case. > >> The following test case also ICE's during forwprop: > >> svint32_t foo() > >> { > >> int32x4_t v = (int32x4_t) {1, 2, 3, 4}; > >> svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]); > >> return v2; > >> } > >> > >> foo2.c: In function ‘foo’: > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’ > >> 9 | } > >> | ^ > >> svint32_t > >> int32x4_t > >> v2_4 = { 1, 2, 3, 4 }; > >> > >> because simplify_permutation folds > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} > > >> into: > >> vector_cst {1, 2, 3, 4} > >> > >> and it complains during verify_gimple_assign_single because we don't > >> support assignment of vector_cst to VLA vector. > >> > >> I guess the issue really is that currently, only VEC_PERM_EXPR > >> supports lhs and rhs > >> to have vector types with differing lengths, and simplifying it to > >> other tree codes, like above, > >> will result in type errors ? > > > > That might be the case - Richard should know. > > I don't see anything particularly special about VEC_PERM_EXPR here, > or about the VLA vs. VLS thing. We would have the same issue trying > to build a 128-bit vector from 2 64-bit vectors. And there are other > tree codes whose input types are/can be different from their output > types. > > So it just seems like a normal type correctness issue: a VEC_PERM_EXPR > of type T needs to be replaced by something of type T. Whether T has a > constant size or a variable size doesn't matter. > > > If so your type check > > is still too late, you should instead recognize that we are permuting > > a VLA vector and then refuse to go any of the non-VEC_PERM generating > > paths - that probably means just allow
Re: ICE after folding svld1rq to vec_perm_expr duing forwprop
Richard Biener writes: > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni > wrote: >> >> On Wed, 13 Jul 2022 at 12:22, Richard Biener >> wrote: >> > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches >> > wrote: >> > > >> > > Hi Richard, >> > > For the following test: >> > > >> > > svint32_t f2(int a, int b, int c, int d) >> > > { >> > > int32x4_t v = (int32x4_t) {a, b, c, d}; >> > > return svld1rq_s32 (svptrue_b8 (), &v[0]); >> > > } >> > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve: >> > > foo.c: In function ‘f2’: >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’ >> > > 4 | svint32_t f2(int a, int b, int c, int d) >> > > | ^~ >> > > svint32_t >> > > __Int32x4_t >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); >> > > during GIMPLE pass: forwprop >> > > dump file: foo.c.109t.forwprop2 >> > > foo.c:4:11: internal compiler error: verify_gimple failed >> > > 0xfda04a verify_gimple_in_cfg(function*, bool) >> > > ../../gcc/gcc/tree-cfg.cc:5568 >> > > 0xe9371f execute_function_todo >> > > ../../gcc/gcc/passes.cc:2091 >> > > 0xe93ccb execute_todo >> > > ../../gcc/gcc/passes.cc:2145 >> > > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we >> > > have: >> > > int32x4_t v; >> > > __Int32x4_t _1; >> > > svint32_t _9; >> > > vector(4) int _11; >> > > >> > >: >> > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; >> > > v_12 = _1; >> > > _11 = v_12; >> > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; >> > > return _9; >> > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to >> > > view_convert_expr, >> > > and the end result becomes: >> > > svint32_t _7; >> > > __Int32x4_t _8; >> > > >> > > ;; basic block 2, loop depth 0 >> > > ;;pred: ENTRY >> > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); >> > > return _7; >> > > ;;succ: EXIT >> > > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR >> > > has incompatible types (svint32_t, int32x4_t). >> > > >> > > The attached patch disables simplification of VEC_PERM_EXPR >> > > in simplify_permutation, if lhs and rhs have non compatible types, >> > > which resolves ICE, but am not sure if it's the correct approach ? >> > >> > It for sure papers over the issue. I think the error happens earlier, >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR >> > which is the type of the LHS. But then you probably run into the >> > different sizes ICE (VLA vs constant size). I think for this case you >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, >> > selecting the "low" part of the VLA vector. >> Hi Richard, >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to >> represent dup operation >> from fixed width to VLA vector. I am not sure how folding it to >> BIT_FIELD_REF will work. >> Could you please elaborate ? >> >> Also, the issue doesn't seem restricted to this case. >> The following test case also ICE's during forwprop: >> svint32_t foo() >> { >> int32x4_t v = (int32x4_t) {1, 2, 3, 4}; >> svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]); >> return v2; >> } >> >> foo2.c: In function ‘foo’: >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’ >> 9 | } >> | ^ >> svint32_t >> int32x4_t >> v2_4 = { 1, 2, 3, 4 }; >> >> because simplify_permutation folds >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} > >> into: >> vector_cst {1, 2, 3, 4} >> >> and it complains during verify_gimple_assign_single because we don't >> support assignment of vector_cst to VLA vector. >> >> I guess the issue really is that currently, only VEC_PERM_EXPR >> supports lhs and rhs >> to have vector types with differing lengths, and simplifying it to >> other tree codes, like above, >> will result in type errors ? > > That might be the case - Richard should know. I don't see anything particularly special about VEC_PERM_EXPR here, or about the VLA vs. VLS thing. We would have the same issue trying to build a 128-bit vector from 2 64-bit vectors. And there are other tree codes whose input types are/can be different from their output types. So it just seems like a normal type correctness issue: a VEC_PERM_EXPR of type T needs to be replaced by something of type T. Whether T has a constant size or a variable size doesn't matter. > If so your type check > is still too late, you should instead recognize that we are permuting > a VLA vector and then refuse to go any of the non-VEC_PERM generating > paths - that probably means just allowing the code == VEC_PERM_EXPR > case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases? Yeah. If we're talking about the match.pd code, I think only: (if (sel.series_p (0, 1, 0, 1)) { op0; } (if (sel.series_p (0, 1, nelts, 1)) { op1; } need a type compatibility check. For fold_vec_perm I think we
Re: ICE after folding svld1rq to vec_perm_expr duing forwprop
On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni wrote: > > On Wed, 13 Jul 2022 at 12:22, Richard Biener > wrote: > > > > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches > > wrote: > > > > > > Hi Richard, > > > For the following test: > > > > > > svint32_t f2(int a, int b, int c, int d) > > > { > > > int32x4_t v = (int32x4_t) {a, b, c, d}; > > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > } > > > > > > The compiler emits following ICE with -O3 -mcpu=generic+sve: > > > foo.c: In function ‘f2’: > > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’ > > > 4 | svint32_t f2(int a, int b, int c, int d) > > > | ^~ > > > svint32_t > > > __Int32x4_t > > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > during GIMPLE pass: forwprop > > > dump file: foo.c.109t.forwprop2 > > > foo.c:4:11: internal compiler error: verify_gimple failed > > > 0xfda04a verify_gimple_in_cfg(function*, bool) > > > ../../gcc/gcc/tree-cfg.cc:5568 > > > 0xe9371f execute_function_todo > > > ../../gcc/gcc/passes.cc:2091 > > > 0xe93ccb execute_todo > > > ../../gcc/gcc/passes.cc:2145 > > > > > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have: > > > int32x4_t v; > > > __Int32x4_t _1; > > > svint32_t _9; > > > vector(4) int _11; > > > > > >: > > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; > > > v_12 = _1; > > > _11 = v_12; > > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; > > > return _9; > > > > > > During forwprop, simplify_permutation simplifies vec_perm_expr to > > > view_convert_expr, > > > and the end result becomes: > > > svint32_t _7; > > > __Int32x4_t _8; > > > > > > ;; basic block 2, loop depth 0 > > > ;;pred: ENTRY > > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; > > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > return _7; > > > ;;succ: EXIT > > > > > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR > > > has incompatible types (svint32_t, int32x4_t). > > > > > > The attached patch disables simplification of VEC_PERM_EXPR > > > in simplify_permutation, if lhs and rhs have non compatible types, > > > which resolves ICE, but am not sure if it's the correct approach ? > > > > It for sure papers over the issue. I think the error happens earlier, > > the V_C_E should have been built with the type of the VEC_PERM_EXPR > > which is the type of the LHS. But then you probably run into the > > different sizes ICE (VLA vs constant size). I think for this case you > > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, > > selecting the "low" part of the VLA vector. > Hi Richard, > Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to > represent dup operation > from fixed width to VLA vector. I am not sure how folding it to > BIT_FIELD_REF will work. > Could you please elaborate ? > > Also, the issue doesn't seem restricted to this case. > The following test case also ICE's during forwprop: > svint32_t foo() > { > int32x4_t v = (int32x4_t) {1, 2, 3, 4}; > svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]); > return v2; > } > > foo2.c: In function ‘foo’: > foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’ > 9 | } > | ^ > svint32_t > int32x4_t > v2_4 = { 1, 2, 3, 4 }; > > because simplify_permutation folds > VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} > > into: > vector_cst {1, 2, 3, 4} > > and it complains during verify_gimple_assign_single because we don't > support assignment of vector_cst to VLA vector. > > I guess the issue really is that currently, only VEC_PERM_EXPR > supports lhs and rhs > to have vector types with differing lengths, and simplifying it to > other tree codes, like above, > will result in type errors ? That might be the case - Richard should know. If so your type check is still too late, you should instead recognize that we are permuting a VLA vector and then refuse to go any of the non-VEC_PERM generating paths - that probably means just allowing the code == VEC_PERM_EXPR case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases? Richard. > > Thanks, > Prathamesh > > > > > > > > Alternatively, should we allow assignments from fixed-width to SVE > > > vector, so the above > > > VIEW_CONVERT_EXPR would result in dup ? > > > > > > Thanks, > > > Prathamesh
Re: ICE after folding svld1rq to vec_perm_expr duing forwprop
On Wed, 13 Jul 2022 at 12:22, Richard Biener wrote: > > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches > wrote: > > > > Hi Richard, > > For the following test: > > > > svint32_t f2(int a, int b, int c, int d) > > { > > int32x4_t v = (int32x4_t) {a, b, c, d}; > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > } > > > > The compiler emits following ICE with -O3 -mcpu=generic+sve: > > foo.c: In function ‘f2’: > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’ > > 4 | svint32_t f2(int a, int b, int c, int d) > > | ^~ > > svint32_t > > __Int32x4_t > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > during GIMPLE pass: forwprop > > dump file: foo.c.109t.forwprop2 > > foo.c:4:11: internal compiler error: verify_gimple failed > > 0xfda04a verify_gimple_in_cfg(function*, bool) > > ../../gcc/gcc/tree-cfg.cc:5568 > > 0xe9371f execute_function_todo > > ../../gcc/gcc/passes.cc:2091 > > 0xe93ccb execute_todo > > ../../gcc/gcc/passes.cc:2145 > > > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have: > > int32x4_t v; > > __Int32x4_t _1; > > svint32_t _9; > > vector(4) int _11; > > > >: > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; > > v_12 = _1; > > _11 = v_12; > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; > > return _9; > > > > During forwprop, simplify_permutation simplifies vec_perm_expr to > > view_convert_expr, > > and the end result becomes: > > svint32_t _7; > > __Int32x4_t _8; > > > > ;; basic block 2, loop depth 0 > > ;;pred: ENTRY > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > return _7; > > ;;succ: EXIT > > > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR > > has incompatible types (svint32_t, int32x4_t). > > > > The attached patch disables simplification of VEC_PERM_EXPR > > in simplify_permutation, if lhs and rhs have non compatible types, > > which resolves ICE, but am not sure if it's the correct approach ? > > It for sure papers over the issue. I think the error happens earlier, > the V_C_E should have been built with the type of the VEC_PERM_EXPR > which is the type of the LHS. But then you probably run into the > different sizes ICE (VLA vs constant size). I think for this case you > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, > selecting the "low" part of the VLA vector. Hi Richard, Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to represent dup operation from fixed width to VLA vector. I am not sure how folding it to BIT_FIELD_REF will work. Could you please elaborate ? Also, the issue doesn't seem restricted to this case. The following test case also ICE's during forwprop: svint32_t foo() { int32x4_t v = (int32x4_t) {1, 2, 3, 4}; svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]); return v2; } foo2.c: In function ‘foo’: foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’ 9 | } | ^ svint32_t int32x4_t v2_4 = { 1, 2, 3, 4 }; because simplify_permutation folds VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} > into: vector_cst {1, 2, 3, 4} and it complains during verify_gimple_assign_single because we don't support assignment of vector_cst to VLA vector. I guess the issue really is that currently, only VEC_PERM_EXPR supports lhs and rhs to have vector types with differing lengths, and simplifying it to other tree codes, like above, will result in type errors ? Thanks, Prathamesh > > > > > Alternatively, should we allow assignments from fixed-width to SVE > > vector, so the above > > VIEW_CONVERT_EXPR would result in dup ? > > > > Thanks, > > Prathamesh
Re: ICE after folding svld1rq to vec_perm_expr duing forwprop
On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches wrote: > > Hi Richard, > For the following test: > > svint32_t f2(int a, int b, int c, int d) > { > int32x4_t v = (int32x4_t) {a, b, c, d}; > return svld1rq_s32 (svptrue_b8 (), &v[0]); > } > > The compiler emits following ICE with -O3 -mcpu=generic+sve: > foo.c: In function ‘f2’: > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’ > 4 | svint32_t f2(int a, int b, int c, int d) > | ^~ > svint32_t > __Int32x4_t > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > during GIMPLE pass: forwprop > dump file: foo.c.109t.forwprop2 > foo.c:4:11: internal compiler error: verify_gimple failed > 0xfda04a verify_gimple_in_cfg(function*, bool) > ../../gcc/gcc/tree-cfg.cc:5568 > 0xe9371f execute_function_todo > ../../gcc/gcc/passes.cc:2091 > 0xe93ccb execute_todo > ../../gcc/gcc/passes.cc:2145 > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have: > int32x4_t v; > __Int32x4_t _1; > svint32_t _9; > vector(4) int _11; > >: > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; > v_12 = _1; > _11 = v_12; > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; > return _9; > > During forwprop, simplify_permutation simplifies vec_perm_expr to > view_convert_expr, > and the end result becomes: > svint32_t _7; > __Int32x4_t _8; > > ;; basic block 2, loop depth 0 > ;;pred: ENTRY > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > return _7; > ;;succ: EXIT > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR > has incompatible types (svint32_t, int32x4_t). > > The attached patch disables simplification of VEC_PERM_EXPR > in simplify_permutation, if lhs and rhs have non compatible types, > which resolves ICE, but am not sure if it's the correct approach ? It for sure papers over the issue. I think the error happens earlier, the V_C_E should have been built with the type of the VEC_PERM_EXPR which is the type of the LHS. But then you probably run into the different sizes ICE (VLA vs constant size). I think for this case you want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, selecting the "low" part of the VLA vector. > > Alternatively, should we allow assignments from fixed-width to SVE > vector, so the above > VIEW_CONVERT_EXPR would result in dup ? > > Thanks, > Prathamesh
ICE after folding svld1rq to vec_perm_expr duing forwprop
Hi Richard, For the following test: svint32_t f2(int a, int b, int c, int d) { int32x4_t v = (int32x4_t) {a, b, c, d}; return svld1rq_s32 (svptrue_b8 (), &v[0]); } The compiler emits following ICE with -O3 -mcpu=generic+sve: foo.c: In function ‘f2’: foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’ 4 | svint32_t f2(int a, int b, int c, int d) | ^~ svint32_t __Int32x4_t _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); during GIMPLE pass: forwprop dump file: foo.c.109t.forwprop2 foo.c:4:11: internal compiler error: verify_gimple failed 0xfda04a verify_gimple_in_cfg(function*, bool) ../../gcc/gcc/tree-cfg.cc:5568 0xe9371f execute_function_todo ../../gcc/gcc/passes.cc:2091 0xe93ccb execute_todo ../../gcc/gcc/passes.cc:2145 This happens because, after folding svld1rq_s32 to vec_perm_expr, we have: int32x4_t v; __Int32x4_t _1; svint32_t _9; vector(4) int _11; : _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; v_12 = _1; _11 = v_12; _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; return _9; During forwprop, simplify_permutation simplifies vec_perm_expr to view_convert_expr, and the end result becomes: svint32_t _7; __Int32x4_t _8; ;; basic block 2, loop depth 0 ;;pred: ENTRY _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); return _7; ;;succ: EXIT which causes the error duing verify_gimple since VIEW_CONVERT_EXPR has incompatible types (svint32_t, int32x4_t). The attached patch disables simplification of VEC_PERM_EXPR in simplify_permutation, if lhs and rhs have non compatible types, which resolves ICE, but am not sure if it's the correct approach ? Alternatively, should we allow assignments from fixed-width to SVE vector, so the above VIEW_CONVERT_EXPR would result in dup ? Thanks, Prathamesh diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc index 69567ab3275..be888f1c48e 100644 --- a/gcc/tree-ssa-forwprop.cc +++ b/gcc/tree-ssa-forwprop.cc @@ -2414,6 +2414,9 @@ simplify_permutation (gimple_stmt_iterator *gsi) if (TREE_CODE (op2) != VECTOR_CST) return 0; + if (!types_compatible_p (TREE_TYPE (gimple_get_lhs (stmt)), TREE_TYPE (op0))) +return 0; + if (TREE_CODE (op0) == VECTOR_CST) { code = VECTOR_CST;