Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-09-05 Thread Richard Biener via Gcc-patches
On Mon, Sep 5, 2022 at 11:27 AM Prathamesh Kulkarni
 wrote:
>
> On Mon, 5 Sept 2022 at 14:39, Richard Biener  
> wrote:
> >
> > On Mon, Sep 5, 2022 at 10:54 AM Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Mon, 29 Aug 2022 at 11:53, Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Thu, 18 Aug 2022 at 18:20, Prathamesh Kulkarni
> > > >  wrote:
> > > > >
> > > > > On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni
> > > > >  wrote:
> > > > > >
> > > > > > On Wed, 17 Aug 2022 at 17:01, Richard Biener 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Prathamesh Kulkarni  writes:
> > > > > > > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener 
> > > > > > > > >  wrote:
> > > > > > > > >>
> > > > > > > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni
> > > > > > > > >>  wrote:
> > > > > > > > >> >
> > > > > > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener 
> > > > > > > > >> >  w>> > >
> > > > > > > > >> > >
> > > > > > > > >> > >   /* If result vector has greater length than input 
> > > > > > > > >> > > vector,
> > > > > > > > >> > > + then allow permuting two vectors as long as:
> > > > > > > > >> > > + a) sel.nelts_per_pattern == 1
> > > > > > > > >> > > + b) sel.npatterns == len of input vector.
> > > > > > > > >> > > + The intent is to permute input vectors, and
> > > > > > > > >> > > + dup the elements in resulting vector to target 
> > > > > > > > >> > > vector length.  */
> > > > > > > > >> > > +
> > > > > > > > >> > > +  if (maybe_gt (TYPE_VECTOR_SUBPARTS (type),
> > > > > > > > >> > > +   TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0
> > > > > > > > >> > > +{
> > > > > > > > >> > > +  nelts = sel.encoding ().npatterns ();
> > > > > > > > >> > > +  if (sel.encoding ().nelts_per_pattern () != 1
> > > > > > > > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS 
> > > > > > > > >> > > (TREE_TYPE (arg0)
> > > > > > > > >> > > +   return NULL_TREE;
> > > > > > > > >> > > +}
> > > > > > > > >> > >
> > > > > > > > >> > > so the only case you add is non-VLA to VLA and there
> > > > > > > > >> > > explicitely only the case of a period that's same as the
> > > > > > > > >> > > element count in the input vectors.
> > > > > > > > >> > >
> > > > > > > > >> > >
> > > > > > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer 
> > > > > > > > >> > > *pp, tree
> > > > > > > > >> > > node, int spc, dump_flags_t flags,
> > > > > > > > >> > > pp_space (pp);
> > > > > > > > >> > >   }
> > > > > > > > >> > >   }
> > > > > > > > >> > > +   if (VECTOR_TYPE_P (TREE_TYPE (node))
> > > > > > > > >> > > +   && !TYPE_VECTOR_SUBPARTS (TREE_TYPE 
> > > > > > > > >> > > (node)).is_constant ())
> > > > > > > > >> > > + pp_string (pp, ", ... ");
> > > > > > > > >> > > pp_right_brace (pp);
> > > > > > > > >> > >
> > > > > > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"?  Are 
> > > > > > > > >> > > they?
> > > > > > > > >> > Well, it got created for the following case after folding:
> > > > > > > > >> > svint32_t f2(int a, int b, int c, int d)
> > > > > > > > >> > {
> > > > > > > > >> >   int32x4_t v = {a, b, c, d};
> > > > > > > > >> >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > > > > >> > }
> > > > > > > > >> >
> > > > > > > > >> > The svld1rq_s32 call gets folded to:
> > > > > > > > >> > v = {a, b, c, d}
> > > > > > > > >> > lhs = VEC_PERM_EXPR
> > > > > > > > >> >
> > > > > > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to
> > > > > > > > >> > VLA constructor, since elements in v (in_elts) are not 
> > > > > > > > >> > constant, and
> > > > > > > > >> > need_ctor is thus true:
> > > > > > > > >> > lhs = {a, b, c, d, ...}
> > > > > > > > >> > I added "..." to make it more explicit that it's a VLA 
> > > > > > > > >> > constructor.
> > > > > > > > >>
> > > > > > > > >> But I doubt we do anything reasonable with such a beast?  Do 
> > > > > > > > >> we?
> > > > > > > > >> I suppose it's like a vec_duplicate if you view it as 
> > > > > > > > >> V1TImode
> > > > > > > > >> but do we actually make sure to do this duplication?
> > > > > > > > > I am not sure. As mentioned above, the current code-gen for 
> > > > > > > > > VLA
> > > > > > > > > constructor looks pretty bad.
> > > > > > > > > Should we avoid folding VLA constructors for now ?
> > > > > > > >
> > > > > > > > VLA constructors aren't really a thing.  At least, the only VLA 
> > > > > > > > vector
> > > > > > > > you could represent with current CONSTRUCTOR nodes is a 
> > > > > > > > fixed-length
> > > > > > > > sequence at the start of an otherwise zero vector.  I'm not sure
> > > > > > > > we even use that though (perhaps we do and I've forgotten).
> > > > > > > >
> > > > > > > > > I guess these are 2 different issues:
> > > > > > > > > (a) Resolving ICE with VEC_PERM_EXPR for abo

Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-09-05 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 5 Sept 2022 at 14:39, Richard Biener  wrote:
>
> On Mon, Sep 5, 2022 at 10:54 AM Prathamesh Kulkarni
>  wrote:
> >
> > On Mon, 29 Aug 2022 at 11:53, Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Thu, 18 Aug 2022 at 18:20, Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni
> > > >  wrote:
> > > > >
> > > > > On Wed, 17 Aug 2022 at 17:01, Richard Biener 
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford
> > > > > >  wrote:
> > > > > > >
> > > > > > > Prathamesh Kulkarni  writes:
> > > > > > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener 
> > > > > > > >  wrote:
> > > > > > > >>
> > > > > > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni
> > > > > > > >>  wrote:
> > > > > > > >> >
> > > > > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener 
> > > > > > > >> >  w>> > >
> > > > > > > >> > >
> > > > > > > >> > >   /* If result vector has greater length than input vector,
> > > > > > > >> > > + then allow permuting two vectors as long as:
> > > > > > > >> > > + a) sel.nelts_per_pattern == 1
> > > > > > > >> > > + b) sel.npatterns == len of input vector.
> > > > > > > >> > > + The intent is to permute input vectors, and
> > > > > > > >> > > + dup the elements in resulting vector to target 
> > > > > > > >> > > vector length.  */
> > > > > > > >> > > +
> > > > > > > >> > > +  if (maybe_gt (TYPE_VECTOR_SUBPARTS (type),
> > > > > > > >> > > +   TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0
> > > > > > > >> > > +{
> > > > > > > >> > > +  nelts = sel.encoding ().npatterns ();
> > > > > > > >> > > +  if (sel.encoding ().nelts_per_pattern () != 1
> > > > > > > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS 
> > > > > > > >> > > (TREE_TYPE (arg0)
> > > > > > > >> > > +   return NULL_TREE;
> > > > > > > >> > > +}
> > > > > > > >> > >
> > > > > > > >> > > so the only case you add is non-VLA to VLA and there
> > > > > > > >> > > explicitely only the case of a period that's same as the
> > > > > > > >> > > element count in the input vectors.
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer 
> > > > > > > >> > > *pp, tree
> > > > > > > >> > > node, int spc, dump_flags_t flags,
> > > > > > > >> > > pp_space (pp);
> > > > > > > >> > >   }
> > > > > > > >> > >   }
> > > > > > > >> > > +   if (VECTOR_TYPE_P (TREE_TYPE (node))
> > > > > > > >> > > +   && !TYPE_VECTOR_SUBPARTS (TREE_TYPE 
> > > > > > > >> > > (node)).is_constant ())
> > > > > > > >> > > + pp_string (pp, ", ... ");
> > > > > > > >> > > pp_right_brace (pp);
> > > > > > > >> > >
> > > > > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"?  Are 
> > > > > > > >> > > they?
> > > > > > > >> > Well, it got created for the following case after folding:
> > > > > > > >> > svint32_t f2(int a, int b, int c, int d)
> > > > > > > >> > {
> > > > > > > >> >   int32x4_t v = {a, b, c, d};
> > > > > > > >> >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > > > >> > }
> > > > > > > >> >
> > > > > > > >> > The svld1rq_s32 call gets folded to:
> > > > > > > >> > v = {a, b, c, d}
> > > > > > > >> > lhs = VEC_PERM_EXPR
> > > > > > > >> >
> > > > > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to
> > > > > > > >> > VLA constructor, since elements in v (in_elts) are not 
> > > > > > > >> > constant, and
> > > > > > > >> > need_ctor is thus true:
> > > > > > > >> > lhs = {a, b, c, d, ...}
> > > > > > > >> > I added "..." to make it more explicit that it's a VLA 
> > > > > > > >> > constructor.
> > > > > > > >>
> > > > > > > >> But I doubt we do anything reasonable with such a beast?  Do 
> > > > > > > >> we?
> > > > > > > >> I suppose it's like a vec_duplicate if you view it as V1TImode
> > > > > > > >> but do we actually make sure to do this duplication?
> > > > > > > > I am not sure. As mentioned above, the current code-gen for VLA
> > > > > > > > constructor looks pretty bad.
> > > > > > > > Should we avoid folding VLA constructors for now ?
> > > > > > >
> > > > > > > VLA constructors aren't really a thing.  At least, the only VLA 
> > > > > > > vector
> > > > > > > you could represent with current CONSTRUCTOR nodes is a 
> > > > > > > fixed-length
> > > > > > > sequence at the start of an otherwise zero vector.  I'm not sure
> > > > > > > we even use that though (perhaps we do and I've forgotten).
> > > > > > >
> > > > > > > > I guess these are 2 different issues:
> > > > > > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests.
> > > > > > > > (b) Extending fold_vec_perm to handle vectors with differing 
> > > > > > > > lengths.
> > > > > > > >
> > > > > > > > For (a), I think the issue with using:
> > > > > > > > res_type = gimple_assign_lhs (stmt)
> > > > > > > > in previous patch, was that op2's type will change to match 
> > > > > > >

Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-09-05 Thread Richard Biener via Gcc-patches
On Mon, Sep 5, 2022 at 10:54 AM Prathamesh Kulkarni
 wrote:
>
> On Mon, 29 Aug 2022 at 11:53, Prathamesh Kulkarni
>  wrote:
> >
> > On Thu, 18 Aug 2022 at 18:20, Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Wed, 17 Aug 2022 at 17:01, Richard Biener 
> > > >  wrote:
> > > > >
> > > > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford
> > > > >  wrote:
> > > > > >
> > > > > > Prathamesh Kulkarni  writes:
> > > > > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener 
> > > > > > >  wrote:
> > > > > > >>
> > > > > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni
> > > > > > >>  wrote:
> > > > > > >> >
> > > > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener 
> > > > > > >> >  w>> > >
> > > > > > >> > >
> > > > > > >> > >   /* If result vector has greater length than input vector,
> > > > > > >> > > + then allow permuting two vectors as long as:
> > > > > > >> > > + a) sel.nelts_per_pattern == 1
> > > > > > >> > > + b) sel.npatterns == len of input vector.
> > > > > > >> > > + The intent is to permute input vectors, and
> > > > > > >> > > + dup the elements in resulting vector to target vector 
> > > > > > >> > > length.  */
> > > > > > >> > > +
> > > > > > >> > > +  if (maybe_gt (TYPE_VECTOR_SUBPARTS (type),
> > > > > > >> > > +   TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0
> > > > > > >> > > +{
> > > > > > >> > > +  nelts = sel.encoding ().npatterns ();
> > > > > > >> > > +  if (sel.encoding ().nelts_per_pattern () != 1
> > > > > > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS 
> > > > > > >> > > (TREE_TYPE (arg0)
> > > > > > >> > > +   return NULL_TREE;
> > > > > > >> > > +}
> > > > > > >> > >
> > > > > > >> > > so the only case you add is non-VLA to VLA and there
> > > > > > >> > > explicitely only the case of a period that's same as the
> > > > > > >> > > element count in the input vectors.
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, 
> > > > > > >> > > tree
> > > > > > >> > > node, int spc, dump_flags_t flags,
> > > > > > >> > > pp_space (pp);
> > > > > > >> > >   }
> > > > > > >> > >   }
> > > > > > >> > > +   if (VECTOR_TYPE_P (TREE_TYPE (node))
> > > > > > >> > > +   && !TYPE_VECTOR_SUBPARTS (TREE_TYPE 
> > > > > > >> > > (node)).is_constant ())
> > > > > > >> > > + pp_string (pp, ", ... ");
> > > > > > >> > > pp_right_brace (pp);
> > > > > > >> > >
> > > > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"?  Are 
> > > > > > >> > > they?
> > > > > > >> > Well, it got created for the following case after folding:
> > > > > > >> > svint32_t f2(int a, int b, int c, int d)
> > > > > > >> > {
> > > > > > >> >   int32x4_t v = {a, b, c, d};
> > > > > > >> >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > > >> > }
> > > > > > >> >
> > > > > > >> > The svld1rq_s32 call gets folded to:
> > > > > > >> > v = {a, b, c, d}
> > > > > > >> > lhs = VEC_PERM_EXPR
> > > > > > >> >
> > > > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to
> > > > > > >> > VLA constructor, since elements in v (in_elts) are not 
> > > > > > >> > constant, and
> > > > > > >> > need_ctor is thus true:
> > > > > > >> > lhs = {a, b, c, d, ...}
> > > > > > >> > I added "..." to make it more explicit that it's a VLA 
> > > > > > >> > constructor.
> > > > > > >>
> > > > > > >> But I doubt we do anything reasonable with such a beast?  Do we?
> > > > > > >> I suppose it's like a vec_duplicate if you view it as V1TImode
> > > > > > >> but do we actually make sure to do this duplication?
> > > > > > > I am not sure. As mentioned above, the current code-gen for VLA
> > > > > > > constructor looks pretty bad.
> > > > > > > Should we avoid folding VLA constructors for now ?
> > > > > >
> > > > > > VLA constructors aren't really a thing.  At least, the only VLA 
> > > > > > vector
> > > > > > you could represent with current CONSTRUCTOR nodes is a fixed-length
> > > > > > sequence at the start of an otherwise zero vector.  I'm not sure
> > > > > > we even use that though (perhaps we do and I've forgotten).
> > > > > >
> > > > > > > I guess these are 2 different issues:
> > > > > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests.
> > > > > > > (b) Extending fold_vec_perm to handle vectors with differing 
> > > > > > > lengths.
> > > > > > >
> > > > > > > For (a), I think the issue with using:
> > > > > > > res_type = gimple_assign_lhs (stmt)
> > > > > > > in previous patch, was that op2's type will change to match 
> > > > > > > tgt_units,
> > > > > > > if we go thru
> > > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch,
> > > > > > > and may thus not be same as len(lhs_type) anymore, and hit the 
> > > > > > > assert
> > > > > > > in fold_vec_perm.
> > > > > > >
> > > > > > > IIUC, for lhs = VEC

Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-09-05 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 29 Aug 2022 at 11:53, Prathamesh Kulkarni
 wrote:
>
> On Thu, 18 Aug 2022 at 18:20, Prathamesh Kulkarni
>  wrote:
> >
> > On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Wed, 17 Aug 2022 at 17:01, Richard Biener  
> > > wrote:
> > > >
> > > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford
> > > >  wrote:
> > > > >
> > > > > Prathamesh Kulkarni  writes:
> > > > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener 
> > > > > >  wrote:
> > > > > >>
> > > > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni
> > > > > >>  wrote:
> > > > > >> >
> > > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener 
> > > > > >> >  w>> > >
> > > > > >> > >
> > > > > >> > >   /* If result vector has greater length than input vector,
> > > > > >> > > + then allow permuting two vectors as long as:
> > > > > >> > > + a) sel.nelts_per_pattern == 1
> > > > > >> > > + b) sel.npatterns == len of input vector.
> > > > > >> > > + The intent is to permute input vectors, and
> > > > > >> > > + dup the elements in resulting vector to target vector 
> > > > > >> > > length.  */
> > > > > >> > > +
> > > > > >> > > +  if (maybe_gt (TYPE_VECTOR_SUBPARTS (type),
> > > > > >> > > +   TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0
> > > > > >> > > +{
> > > > > >> > > +  nelts = sel.encoding ().npatterns ();
> > > > > >> > > +  if (sel.encoding ().nelts_per_pattern () != 1
> > > > > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS 
> > > > > >> > > (TREE_TYPE (arg0)
> > > > > >> > > +   return NULL_TREE;
> > > > > >> > > +}
> > > > > >> > >
> > > > > >> > > so the only case you add is non-VLA to VLA and there
> > > > > >> > > explicitely only the case of a period that's same as the
> > > > > >> > > element count in the input vectors.
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, 
> > > > > >> > > tree
> > > > > >> > > node, int spc, dump_flags_t flags,
> > > > > >> > > pp_space (pp);
> > > > > >> > >   }
> > > > > >> > >   }
> > > > > >> > > +   if (VECTOR_TYPE_P (TREE_TYPE (node))
> > > > > >> > > +   && !TYPE_VECTOR_SUBPARTS (TREE_TYPE 
> > > > > >> > > (node)).is_constant ())
> > > > > >> > > + pp_string (pp, ", ... ");
> > > > > >> > > pp_right_brace (pp);
> > > > > >> > >
> > > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"?  Are they?
> > > > > >> > Well, it got created for the following case after folding:
> > > > > >> > svint32_t f2(int a, int b, int c, int d)
> > > > > >> > {
> > > > > >> >   int32x4_t v = {a, b, c, d};
> > > > > >> >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > >> > }
> > > > > >> >
> > > > > >> > The svld1rq_s32 call gets folded to:
> > > > > >> > v = {a, b, c, d}
> > > > > >> > lhs = VEC_PERM_EXPR
> > > > > >> >
> > > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to
> > > > > >> > VLA constructor, since elements in v (in_elts) are not constant, 
> > > > > >> > and
> > > > > >> > need_ctor is thus true:
> > > > > >> > lhs = {a, b, c, d, ...}
> > > > > >> > I added "..." to make it more explicit that it's a VLA 
> > > > > >> > constructor.
> > > > > >>
> > > > > >> But I doubt we do anything reasonable with such a beast?  Do we?
> > > > > >> I suppose it's like a vec_duplicate if you view it as V1TImode
> > > > > >> but do we actually make sure to do this duplication?
> > > > > > I am not sure. As mentioned above, the current code-gen for VLA
> > > > > > constructor looks pretty bad.
> > > > > > Should we avoid folding VLA constructors for now ?
> > > > >
> > > > > VLA constructors aren't really a thing.  At least, the only VLA vector
> > > > > you could represent with current CONSTRUCTOR nodes is a fixed-length
> > > > > sequence at the start of an otherwise zero vector.  I'm not sure
> > > > > we even use that though (perhaps we do and I've forgotten).
> > > > >
> > > > > > I guess these are 2 different issues:
> > > > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests.
> > > > > > (b) Extending fold_vec_perm to handle vectors with differing 
> > > > > > lengths.
> > > > > >
> > > > > > For (a), I think the issue with using:
> > > > > > res_type = gimple_assign_lhs (stmt)
> > > > > > in previous patch, was that op2's type will change to match 
> > > > > > tgt_units,
> > > > > > if we go thru
> > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch,
> > > > > > and may thus not be same as len(lhs_type) anymore, and hit the 
> > > > > > assert
> > > > > > in fold_vec_perm.
> > > > > >
> > > > > > IIUC, for lhs = VEC_PERM_EXPR, we now have the
> > > > > > following semantics:
> > > > > > (1) Element types for lhs, rhs1 and rhs2 should be the same.
> > > > > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2).
> > > > >
> > > > > Yeah.
> > > > >
> > > > > > The attached patch changes res_type from TREE_TYPE (arg0) to 
>

Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-08-28 Thread Prathamesh Kulkarni via Gcc-patches
On Thu, 18 Aug 2022 at 18:20, Prathamesh Kulkarni
 wrote:
>
> On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni
>  wrote:
> >
> > On Wed, 17 Aug 2022 at 17:01, Richard Biener  
> > wrote:
> > >
> > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford
> > >  wrote:
> > > >
> > > > Prathamesh Kulkarni  writes:
> > > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener 
> > > > >  wrote:
> > > > >>
> > > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni
> > > > >>  wrote:
> > > > >> >
> > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener 
> > > > >> >  w>> > >
> > > > >> > >
> > > > >> > >   /* If result vector has greater length than input vector,
> > > > >> > > + then allow permuting two vectors as long as:
> > > > >> > > + a) sel.nelts_per_pattern == 1
> > > > >> > > + b) sel.npatterns == len of input vector.
> > > > >> > > + The intent is to permute input vectors, and
> > > > >> > > + dup the elements in resulting vector to target vector 
> > > > >> > > length.  */
> > > > >> > > +
> > > > >> > > +  if (maybe_gt (TYPE_VECTOR_SUBPARTS (type),
> > > > >> > > +   TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0
> > > > >> > > +{
> > > > >> > > +  nelts = sel.encoding ().npatterns ();
> > > > >> > > +  if (sel.encoding ().nelts_per_pattern () != 1
> > > > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE 
> > > > >> > > (arg0)
> > > > >> > > +   return NULL_TREE;
> > > > >> > > +}
> > > > >> > >
> > > > >> > > so the only case you add is non-VLA to VLA and there
> > > > >> > > explicitely only the case of a period that's same as the
> > > > >> > > element count in the input vectors.
> > > > >> > >
> > > > >> > >
> > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree
> > > > >> > > node, int spc, dump_flags_t flags,
> > > > >> > > pp_space (pp);
> > > > >> > >   }
> > > > >> > >   }
> > > > >> > > +   if (VECTOR_TYPE_P (TREE_TYPE (node))
> > > > >> > > +   && !TYPE_VECTOR_SUBPARTS (TREE_TYPE 
> > > > >> > > (node)).is_constant ())
> > > > >> > > + pp_string (pp, ", ... ");
> > > > >> > > pp_right_brace (pp);
> > > > >> > >
> > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"?  Are they?
> > > > >> > Well, it got created for the following case after folding:
> > > > >> > svint32_t f2(int a, int b, int c, int d)
> > > > >> > {
> > > > >> >   int32x4_t v = {a, b, c, d};
> > > > >> >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > >> > }
> > > > >> >
> > > > >> > The svld1rq_s32 call gets folded to:
> > > > >> > v = {a, b, c, d}
> > > > >> > lhs = VEC_PERM_EXPR
> > > > >> >
> > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to
> > > > >> > VLA constructor, since elements in v (in_elts) are not constant, 
> > > > >> > and
> > > > >> > need_ctor is thus true:
> > > > >> > lhs = {a, b, c, d, ...}
> > > > >> > I added "..." to make it more explicit that it's a VLA constructor.
> > > > >>
> > > > >> But I doubt we do anything reasonable with such a beast?  Do we?
> > > > >> I suppose it's like a vec_duplicate if you view it as V1TImode
> > > > >> but do we actually make sure to do this duplication?
> > > > > I am not sure. As mentioned above, the current code-gen for VLA
> > > > > constructor looks pretty bad.
> > > > > Should we avoid folding VLA constructors for now ?
> > > >
> > > > VLA constructors aren't really a thing.  At least, the only VLA vector
> > > > you could represent with current CONSTRUCTOR nodes is a fixed-length
> > > > sequence at the start of an otherwise zero vector.  I'm not sure
> > > > we even use that though (perhaps we do and I've forgotten).
> > > >
> > > > > I guess these are 2 different issues:
> > > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests.
> > > > > (b) Extending fold_vec_perm to handle vectors with differing lengths.
> > > > >
> > > > > For (a), I think the issue with using:
> > > > > res_type = gimple_assign_lhs (stmt)
> > > > > in previous patch, was that op2's type will change to match tgt_units,
> > > > > if we go thru
> > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch,
> > > > > and may thus not be same as len(lhs_type) anymore, and hit the assert
> > > > > in fold_vec_perm.
> > > > >
> > > > > IIUC, for lhs = VEC_PERM_EXPR, we now have the
> > > > > following semantics:
> > > > > (1) Element types for lhs, rhs1 and rhs2 should be the same.
> > > > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2).
> > > >
> > > > Yeah.
> > > >
> > > > > The attached patch changes res_type from TREE_TYPE (arg0) to 
> > > > > following:
> > > > > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)),
> > > > > TYPE_VECTOR_SUBPARTS 
> > > > > (op2))
> > > > > so it has same element type as arg0 (and arg1) and len of op2.
> > > > > Does that look reasonable ?
> > > > >
> > > > > If we need a cast from res_ty

Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-08-18 Thread Prathamesh Kulkarni via Gcc-patches
On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni
 wrote:
>
> On Wed, 17 Aug 2022 at 17:01, Richard Biener  
> wrote:
> >
> > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford
> >  wrote:
> > >
> > > Prathamesh Kulkarni  writes:
> > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener 
> > > >  wrote:
> > > >>
> > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni
> > > >>  wrote:
> > > >> >
> > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener 
> > > >> >  w>> > >
> > > >> > >
> > > >> > >   /* If result vector has greater length than input vector,
> > > >> > > + then allow permuting two vectors as long as:
> > > >> > > + a) sel.nelts_per_pattern == 1
> > > >> > > + b) sel.npatterns == len of input vector.
> > > >> > > + The intent is to permute input vectors, and
> > > >> > > + dup the elements in resulting vector to target vector 
> > > >> > > length.  */
> > > >> > > +
> > > >> > > +  if (maybe_gt (TYPE_VECTOR_SUBPARTS (type),
> > > >> > > +   TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0
> > > >> > > +{
> > > >> > > +  nelts = sel.encoding ().npatterns ();
> > > >> > > +  if (sel.encoding ().nelts_per_pattern () != 1
> > > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE 
> > > >> > > (arg0)
> > > >> > > +   return NULL_TREE;
> > > >> > > +}
> > > >> > >
> > > >> > > so the only case you add is non-VLA to VLA and there
> > > >> > > explicitely only the case of a period that's same as the
> > > >> > > element count in the input vectors.
> > > >> > >
> > > >> > >
> > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree
> > > >> > > node, int spc, dump_flags_t flags,
> > > >> > > pp_space (pp);
> > > >> > >   }
> > > >> > >   }
> > > >> > > +   if (VECTOR_TYPE_P (TREE_TYPE (node))
> > > >> > > +   && !TYPE_VECTOR_SUBPARTS (TREE_TYPE 
> > > >> > > (node)).is_constant ())
> > > >> > > + pp_string (pp, ", ... ");
> > > >> > > pp_right_brace (pp);
> > > >> > >
> > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"?  Are they?
> > > >> > Well, it got created for the following case after folding:
> > > >> > svint32_t f2(int a, int b, int c, int d)
> > > >> > {
> > > >> >   int32x4_t v = {a, b, c, d};
> > > >> >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > >> > }
> > > >> >
> > > >> > The svld1rq_s32 call gets folded to:
> > > >> > v = {a, b, c, d}
> > > >> > lhs = VEC_PERM_EXPR
> > > >> >
> > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to
> > > >> > VLA constructor, since elements in v (in_elts) are not constant, and
> > > >> > need_ctor is thus true:
> > > >> > lhs = {a, b, c, d, ...}
> > > >> > I added "..." to make it more explicit that it's a VLA constructor.
> > > >>
> > > >> But I doubt we do anything reasonable with such a beast?  Do we?
> > > >> I suppose it's like a vec_duplicate if you view it as V1TImode
> > > >> but do we actually make sure to do this duplication?
> > > > I am not sure. As mentioned above, the current code-gen for VLA
> > > > constructor looks pretty bad.
> > > > Should we avoid folding VLA constructors for now ?
> > >
> > > VLA constructors aren't really a thing.  At least, the only VLA vector
> > > you could represent with current CONSTRUCTOR nodes is a fixed-length
> > > sequence at the start of an otherwise zero vector.  I'm not sure
> > > we even use that though (perhaps we do and I've forgotten).
> > >
> > > > I guess these are 2 different issues:
> > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests.
> > > > (b) Extending fold_vec_perm to handle vectors with differing lengths.
> > > >
> > > > For (a), I think the issue with using:
> > > > res_type = gimple_assign_lhs (stmt)
> > > > in previous patch, was that op2's type will change to match tgt_units,
> > > > if we go thru
> > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch,
> > > > and may thus not be same as len(lhs_type) anymore, and hit the assert
> > > > in fold_vec_perm.
> > > >
> > > > IIUC, for lhs = VEC_PERM_EXPR, we now have the
> > > > following semantics:
> > > > (1) Element types for lhs, rhs1 and rhs2 should be the same.
> > > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2).
> > >
> > > Yeah.
> > >
> > > > The attached patch changes res_type from TREE_TYPE (arg0) to following:
> > > > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)),
> > > > TYPE_VECTOR_SUBPARTS 
> > > > (op2))
> > > > so it has same element type as arg0 (and arg1) and len of op2.
> > > > Does that look reasonable ?
> > > >
> > > > If we need a cast from res_type to lhs_type, then both would be fixed
> > > > width vectors
> > > > with len(lhs_type) being a multiple of len(res_type).
> > > > IIUC, we don't support casting from VLA vector to/from fixed width 
> > > > vector,
> > >
> > > Yes, that's not supported as a cast.  If the compiler knows the
> > > length

Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-08-18 Thread Prathamesh Kulkarni via Gcc-patches
On Wed, 17 Aug 2022 at 17:01, Richard Biener  wrote:
>
> On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford
>  wrote:
> >
> > Prathamesh Kulkarni  writes:
> > > On Tue, 9 Aug 2022 at 18:42, Richard Biener  
> > > wrote:
> > >>
> > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni
> > >>  wrote:
> > >> >
> > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener 
> > >> >  w>> > >
> > >> > >
> > >> > >   /* If result vector has greater length than input vector,
> > >> > > + then allow permuting two vectors as long as:
> > >> > > + a) sel.nelts_per_pattern == 1
> > >> > > + b) sel.npatterns == len of input vector.
> > >> > > + The intent is to permute input vectors, and
> > >> > > + dup the elements in resulting vector to target vector length.  
> > >> > > */
> > >> > > +
> > >> > > +  if (maybe_gt (TYPE_VECTOR_SUBPARTS (type),
> > >> > > +   TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0
> > >> > > +{
> > >> > > +  nelts = sel.encoding ().npatterns ();
> > >> > > +  if (sel.encoding ().nelts_per_pattern () != 1
> > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE 
> > >> > > (arg0)
> > >> > > +   return NULL_TREE;
> > >> > > +}
> > >> > >
> > >> > > so the only case you add is non-VLA to VLA and there
> > >> > > explicitely only the case of a period that's same as the
> > >> > > element count in the input vectors.
> > >> > >
> > >> > >
> > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree
> > >> > > node, int spc, dump_flags_t flags,
> > >> > > pp_space (pp);
> > >> > >   }
> > >> > >   }
> > >> > > +   if (VECTOR_TYPE_P (TREE_TYPE (node))
> > >> > > +   && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant 
> > >> > > ())
> > >> > > + pp_string (pp, ", ... ");
> > >> > > pp_right_brace (pp);
> > >> > >
> > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"?  Are they?
> > >> > Well, it got created for the following case after folding:
> > >> > svint32_t f2(int a, int b, int c, int d)
> > >> > {
> > >> >   int32x4_t v = {a, b, c, d};
> > >> >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > >> > }
> > >> >
> > >> > The svld1rq_s32 call gets folded to:
> > >> > v = {a, b, c, d}
> > >> > lhs = VEC_PERM_EXPR
> > >> >
> > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to
> > >> > VLA constructor, since elements in v (in_elts) are not constant, and
> > >> > need_ctor is thus true:
> > >> > lhs = {a, b, c, d, ...}
> > >> > I added "..." to make it more explicit that it's a VLA constructor.
> > >>
> > >> But I doubt we do anything reasonable with such a beast?  Do we?
> > >> I suppose it's like a vec_duplicate if you view it as V1TImode
> > >> but do we actually make sure to do this duplication?
> > > I am not sure. As mentioned above, the current code-gen for VLA
> > > constructor looks pretty bad.
> > > Should we avoid folding VLA constructors for now ?
> >
> > VLA constructors aren't really a thing.  At least, the only VLA vector
> > you could represent with current CONSTRUCTOR nodes is a fixed-length
> > sequence at the start of an otherwise zero vector.  I'm not sure
> > we even use that though (perhaps we do and I've forgotten).
> >
> > > I guess these are 2 different issues:
> > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests.
> > > (b) Extending fold_vec_perm to handle vectors with differing lengths.
> > >
> > > For (a), I think the issue with using:
> > > res_type = gimple_assign_lhs (stmt)
> > > in previous patch, was that op2's type will change to match tgt_units,
> > > if we go thru
> > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch,
> > > and may thus not be same as len(lhs_type) anymore, and hit the assert
> > > in fold_vec_perm.
> > >
> > > IIUC, for lhs = VEC_PERM_EXPR, we now have the
> > > following semantics:
> > > (1) Element types for lhs, rhs1 and rhs2 should be the same.
> > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2).
> >
> > Yeah.
> >
> > > The attached patch changes res_type from TREE_TYPE (arg0) to following:
> > > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)),
> > > TYPE_VECTOR_SUBPARTS 
> > > (op2))
> > > so it has same element type as arg0 (and arg1) and len of op2.
> > > Does that look reasonable ?
> > >
> > > If we need a cast from res_type to lhs_type, then both would be fixed
> > > width vectors
> > > with len(lhs_type) being a multiple of len(res_type).
> > > IIUC, we don't support casting from VLA vector to/from fixed width vector,
> >
> > Yes, that's not supported as a cast.  If the compiler knows the
> > length of the "VLA" vector then it's not VLA.  If it doesn't
> > know the length of the VLA vector then the sizes could be different
> > (preventing VIEW_CONVERT_EXPR) and the number of elements could be
> > different (preventing pointwise CONVERT_EXPRs).
> >
> > > or from VLA vector of one type to VL

Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-08-17 Thread Richard Biener via Gcc-patches
On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Tue, 9 Aug 2022 at 18:42, Richard Biener  
> > wrote:
> >>
> >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni
> >>  wrote:
> >> >
> >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener  
> >> > w>> > >
> >> > >
> >> > >   /* If result vector has greater length than input vector,
> >> > > + then allow permuting two vectors as long as:
> >> > > + a) sel.nelts_per_pattern == 1
> >> > > + b) sel.npatterns == len of input vector.
> >> > > + The intent is to permute input vectors, and
> >> > > + dup the elements in resulting vector to target vector length.  */
> >> > > +
> >> > > +  if (maybe_gt (TYPE_VECTOR_SUBPARTS (type),
> >> > > +   TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0
> >> > > +{
> >> > > +  nelts = sel.encoding ().npatterns ();
> >> > > +  if (sel.encoding ().nelts_per_pattern () != 1
> >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE 
> >> > > (arg0)
> >> > > +   return NULL_TREE;
> >> > > +}
> >> > >
> >> > > so the only case you add is non-VLA to VLA and there
> >> > > explicitely only the case of a period that's same as the
> >> > > element count in the input vectors.
> >> > >
> >> > >
> >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree
> >> > > node, int spc, dump_flags_t flags,
> >> > > pp_space (pp);
> >> > >   }
> >> > >   }
> >> > > +   if (VECTOR_TYPE_P (TREE_TYPE (node))
> >> > > +   && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ())
> >> > > + pp_string (pp, ", ... ");
> >> > > pp_right_brace (pp);
> >> > >
> >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"?  Are they?
> >> > Well, it got created for the following case after folding:
> >> > svint32_t f2(int a, int b, int c, int d)
> >> > {
> >> >   int32x4_t v = {a, b, c, d};
> >> >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> >> > }
> >> >
> >> > The svld1rq_s32 call gets folded to:
> >> > v = {a, b, c, d}
> >> > lhs = VEC_PERM_EXPR
> >> >
> >> > fold_vec_perm then folds the above VEC_PERM_EXPR to
> >> > VLA constructor, since elements in v (in_elts) are not constant, and
> >> > need_ctor is thus true:
> >> > lhs = {a, b, c, d, ...}
> >> > I added "..." to make it more explicit that it's a VLA constructor.
> >>
> >> But I doubt we do anything reasonable with such a beast?  Do we?
> >> I suppose it's like a vec_duplicate if you view it as V1TImode
> >> but do we actually make sure to do this duplication?
> > I am not sure. As mentioned above, the current code-gen for VLA
> > constructor looks pretty bad.
> > Should we avoid folding VLA constructors for now ?
>
> VLA constructors aren't really a thing.  At least, the only VLA vector
> you could represent with current CONSTRUCTOR nodes is a fixed-length
> sequence at the start of an otherwise zero vector.  I'm not sure
> we even use that though (perhaps we do and I've forgotten).
>
> > I guess these are 2 different issues:
> > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests.
> > (b) Extending fold_vec_perm to handle vectors with differing lengths.
> >
> > For (a), I think the issue with using:
> > res_type = gimple_assign_lhs (stmt)
> > in previous patch, was that op2's type will change to match tgt_units,
> > if we go thru
> > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch,
> > and may thus not be same as len(lhs_type) anymore, and hit the assert
> > in fold_vec_perm.
> >
> > IIUC, for lhs = VEC_PERM_EXPR, we now have the
> > following semantics:
> > (1) Element types for lhs, rhs1 and rhs2 should be the same.
> > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2).
>
> Yeah.
>
> > The attached patch changes res_type from TREE_TYPE (arg0) to following:
> > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)),
> > TYPE_VECTOR_SUBPARTS (op2))
> > so it has same element type as arg0 (and arg1) and len of op2.
> > Does that look reasonable ?
> >
> > If we need a cast from res_type to lhs_type, then both would be fixed
> > width vectors
> > with len(lhs_type) being a multiple of len(res_type).
> > IIUC, we don't support casting from VLA vector to/from fixed width vector,
>
> Yes, that's not supported as a cast.  If the compiler knows the
> length of the "VLA" vector then it's not VLA.  If it doesn't
> know the length of the VLA vector then the sizes could be different
> (preventing VIEW_CONVERT_EXPR) and the number of elements could be
> different (preventing pointwise CONVERT_EXPRs).
>
> > or from VLA vector of one type to VLA vector of other type ?
>
> That's supported though.  They work just like VLS vectors: if the sizes
> are the same then we can use VIEW_CONVERT_EXPR, if the number of elements
> are the same then we can do pointwise conversions (e.g. element-by-element
> extensions, truncations, conversions to float, convers

Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-08-16 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni  writes:
> On Tue, 9 Aug 2022 at 18:42, Richard Biener  
> wrote:
>>
>> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni
>>  wrote:
>> >
>> > On Mon, 8 Aug 2022 at 14:27, Richard Biener  
>> > w>> > >
>> > >
>> > >   /* If result vector has greater length than input vector,
>> > > + then allow permuting two vectors as long as:
>> > > + a) sel.nelts_per_pattern == 1
>> > > + b) sel.npatterns == len of input vector.
>> > > + The intent is to permute input vectors, and
>> > > + dup the elements in resulting vector to target vector length.  */
>> > > +
>> > > +  if (maybe_gt (TYPE_VECTOR_SUBPARTS (type),
>> > > +   TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0
>> > > +{
>> > > +  nelts = sel.encoding ().npatterns ();
>> > > +  if (sel.encoding ().nelts_per_pattern () != 1
>> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE 
>> > > (arg0)
>> > > +   return NULL_TREE;
>> > > +}
>> > >
>> > > so the only case you add is non-VLA to VLA and there
>> > > explicitely only the case of a period that's same as the
>> > > element count in the input vectors.
>> > >
>> > >
>> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree
>> > > node, int spc, dump_flags_t flags,
>> > > pp_space (pp);
>> > >   }
>> > >   }
>> > > +   if (VECTOR_TYPE_P (TREE_TYPE (node))
>> > > +   && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ())
>> > > + pp_string (pp, ", ... ");
>> > > pp_right_brace (pp);
>> > >
>> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"?  Are they?
>> > Well, it got created for the following case after folding:
>> > svint32_t f2(int a, int b, int c, int d)
>> > {
>> >   int32x4_t v = {a, b, c, d};
>> >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
>> > }
>> >
>> > The svld1rq_s32 call gets folded to:
>> > v = {a, b, c, d}
>> > lhs = VEC_PERM_EXPR
>> >
>> > fold_vec_perm then folds the above VEC_PERM_EXPR to
>> > VLA constructor, since elements in v (in_elts) are not constant, and
>> > need_ctor is thus true:
>> > lhs = {a, b, c, d, ...}
>> > I added "..." to make it more explicit that it's a VLA constructor.
>>
>> But I doubt we do anything reasonable with such a beast?  Do we?
>> I suppose it's like a vec_duplicate if you view it as V1TImode
>> but do we actually make sure to do this duplication?
> I am not sure. As mentioned above, the current code-gen for VLA
> constructor looks pretty bad.
> Should we avoid folding VLA constructors for now ?

VLA constructors aren't really a thing.  At least, the only VLA vector
you could represent with current CONSTRUCTOR nodes is a fixed-length
sequence at the start of an otherwise zero vector.  I'm not sure
we even use that though (perhaps we do and I've forgotten).

> I guess these are 2 different issues:
> (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests.
> (b) Extending fold_vec_perm to handle vectors with differing lengths.
>
> For (a), I think the issue with using:
> res_type = gimple_assign_lhs (stmt)
> in previous patch, was that op2's type will change to match tgt_units,
> if we go thru
> (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch,
> and may thus not be same as len(lhs_type) anymore, and hit the assert
> in fold_vec_perm.
>
> IIUC, for lhs = VEC_PERM_EXPR, we now have the
> following semantics:
> (1) Element types for lhs, rhs1 and rhs2 should be the same.
> (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2).

Yeah.

> The attached patch changes res_type from TREE_TYPE (arg0) to following:
> res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)),
> TYPE_VECTOR_SUBPARTS (op2))
> so it has same element type as arg0 (and arg1) and len of op2.
> Does that look reasonable ?
>
> If we need a cast from res_type to lhs_type, then both would be fixed
> width vectors
> with len(lhs_type) being a multiple of len(res_type).
> IIUC, we don't support casting from VLA vector to/from fixed width vector,

Yes, that's not supported as a cast.  If the compiler knows the
length of the "VLA" vector then it's not VLA.  If it doesn't
know the length of the VLA vector then the sizes could be different
(preventing VIEW_CONVERT_EXPR) and the number of elements could be
different (preventing pointwise CONVERT_EXPRs).

> or from VLA vector of one type to VLA vector of other type ?

That's supported though.  They work just like VLS vectors: if the sizes
are the same then we can use VIEW_CONVERT_EXPR, if the number of elements
are the same then we can do pointwise conversions (e.g. element-by-element
extensions, truncations, conversions to float, conversions to integer, etc).

> Currently, if op2 is VLA, and we enter the branch:
> (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR)
> then I think it will bail out because op2_units will not be a compile
> time constant,
> and constant_multiple_p (op2_units, tgt_units, &facto

Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-08-11 Thread Prathamesh Kulkarni via Gcc-patches
On Tue, 9 Aug 2022 at 18:42, Richard Biener  wrote:
>
> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni
>  wrote:
> >
> > On Mon, 8 Aug 2022 at 14:27, Richard Biener  
> > wrote:
> > >
> > > On Mon, Aug 1, 2022 at 5:17 AM Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Thu, 21 Jul 2022 at 12:21, Richard Biener 
> > > >  wrote:
> > > > >
> > > > > On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni
> > > > >  wrote:
> > > > > >
> > > > > > On Mon, 18 Jul 2022 at 11:57, Richard Biener 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > Richard Biener  writes:
> > > > > > > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
> > > > > > > > > >  wrote:
> > > > > > > > > >>
> > > > > > > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener 
> > > > > > > > > >>  wrote:
> > > > > > > > > >> >
> > > > > > > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via 
> > > > > > > > > >> > Gcc-patches
> > > > > > > > > >> >  wrote:
> > > > > > > > > >> > >
> > > > > > > > > >> > > Hi Richard,
> > > > > > > > > >> > > For the following test:
> > > > > > > > > >> > >
> > > > > > > > > >> > > svint32_t f2(int a, int b, int c, int d)
> > > > > > > > > >> > > {
> > > > > > > > > >> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
> > > > > > > > > >> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > > > > > >> > > }
> > > > > > > > > >> > >
> > > > > > > > > >> > > The compiler emits following ICE with -O3 
> > > > > > > > > >> > > -mcpu=generic+sve:
> > > > > > > > > >> > > foo.c: In function ‘f2’:
> > > > > > > > > >> > > foo.c:4:11: error: non-trivial conversion in 
> > > > > > > > > >> > > ‘view_convert_expr’
> > > > > > > > > >> > > 4 | svint32_t f2(int a, int b, int c, int d)
> > > > > > > > > >> > >   |   ^~
> > > > > > > > > >> > > svint32_t
> > > > > > > > > >> > > __Int32x4_t
> > > > > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > > > > > > >> > > during GIMPLE pass: forwprop
> > > > > > > > > >> > > dump file: foo.c.109t.forwprop2
> > > > > > > > > >> > > foo.c:4:11: internal compiler error: verify_gimple 
> > > > > > > > > >> > > failed
> > > > > > > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
> > > > > > > > > >> > > ../../gcc/gcc/tree-cfg.cc:5568
> > > > > > > > > >> > > 0xe9371f execute_function_todo
> > > > > > > > > >> > > ../../gcc/gcc/passes.cc:2091
> > > > > > > > > >> > > 0xe93ccb execute_todo
> > > > > > > > > >> > > ../../gcc/gcc/passes.cc:2145
> > > > > > > > > >> > >
> > > > > > > > > >> > > This happens because, after folding svld1rq_s32 to 
> > > > > > > > > >> > > vec_perm_expr, we have:
> > > > > > > > > >> > >   int32x4_t v;
> > > > > > > > > >> > >   __Int32x4_t _1;
> > > > > > > > > >> > >   svint32_t _9;
> > > > > > > > > >> > >   vector(4) int _11;
> > > > > > > > > >> > >
> > > > > > > > > >> > >:
> > > > > > > > > >> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> > > > > > > > > >> > >   v_12 = _1;
> > > > > > > > > >> > >   _11 = v_12;
> > > > > > > > > >> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> > > > > > > > > >> > >   return _9;
> > > > > > > > > >> > >
> > > > > > > > > >> > > During forwprop, simplify_permutation simplifies 
> > > > > > > > > >> > > vec_perm_expr to
> > > > > > > > > >> > > view_convert_expr,
> > > > > > > > > >> > > and the end result becomes:
> > > > > > > > > >> > >   svint32_t _7;
> > > > > > > > > >> > >   __Int32x4_t _8;
> > > > > > > > > >> > >
> > > > > > > > > >> > > ;;   basic block 2, loop depth 0
> > > > > > > > > >> > > ;;pred:   ENTRY
> > > > > > > > > >> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> > > > > > > > > >> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > > > > > > >> > >   return _7;
> > > > > > > > > >> > > ;;succ:   EXIT
> > > > > > > > > >> > >
> > > > > > > > > >> > > which causes the error duing verify_gimple since 
> > > > > > > > > >> > > VIEW_CONVERT_EXPR
> > > > > > > > > >> > > has incompatible types (svint32_t, int32x4_t).
> > > > > > > > > >> > >
> > > > > > > > > >> > > The attached patch disables simplification of 
> > > > > > > > > >> > > VEC_PERM_EXPR
> > > > > > > > > >> > > in simplify_permutation, if lhs and rhs have non 
> > > > > > > > > >> > > compatible types,
> > > > > > > > > >> > > which resolves ICE, but am not sure if it's the 
> > > > > > > > > >> > > correct approach ?
> > > > > > > > > >> >
> > > > > > > > > >> > It for sure papers over the issue.  I think the error 
> > > > > > > > > >> > happens earlier,
> > > > > > > > > >> > the V_C_E should have been built with the type of the 
> > > > > > > > > >> > VEC_PERM_EXPR
> > > > > > > > > >> > which is the type of the LHS.  But then you probably run 
> > > > > > > > > >> > into the
> > > > > > > > > >> > di

Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-08-09 Thread Richard Biener via Gcc-patches
On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni
 wrote:
>
> On Mon, 8 Aug 2022 at 14:27, Richard Biener  
> wrote:
> >
> > On Mon, Aug 1, 2022 at 5:17 AM Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Thu, 21 Jul 2022 at 12:21, Richard Biener  
> > > wrote:
> > > >
> > > > On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni
> > > >  wrote:
> > > > >
> > > > > On Mon, 18 Jul 2022 at 11:57, Richard Biener 
> > > > >  wrote:
> > > > > >
> > > > > > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Richard Biener  writes:
> > > > > > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
> > > > > > > > >  wrote:
> > > > > > > > >>
> > > > > > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener 
> > > > > > > > >>  wrote:
> > > > > > > > >> >
> > > > > > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via 
> > > > > > > > >> > Gcc-patches
> > > > > > > > >> >  wrote:
> > > > > > > > >> > >
> > > > > > > > >> > > Hi Richard,
> > > > > > > > >> > > For the following test:
> > > > > > > > >> > >
> > > > > > > > >> > > svint32_t f2(int a, int b, int c, int d)
> > > > > > > > >> > > {
> > > > > > > > >> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
> > > > > > > > >> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > > > > >> > > }
> > > > > > > > >> > >
> > > > > > > > >> > > The compiler emits following ICE with -O3 
> > > > > > > > >> > > -mcpu=generic+sve:
> > > > > > > > >> > > foo.c: In function ‘f2’:
> > > > > > > > >> > > foo.c:4:11: error: non-trivial conversion in 
> > > > > > > > >> > > ‘view_convert_expr’
> > > > > > > > >> > > 4 | svint32_t f2(int a, int b, int c, int d)
> > > > > > > > >> > >   |   ^~
> > > > > > > > >> > > svint32_t
> > > > > > > > >> > > __Int32x4_t
> > > > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > > > > > >> > > during GIMPLE pass: forwprop
> > > > > > > > >> > > dump file: foo.c.109t.forwprop2
> > > > > > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed
> > > > > > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
> > > > > > > > >> > > ../../gcc/gcc/tree-cfg.cc:5568
> > > > > > > > >> > > 0xe9371f execute_function_todo
> > > > > > > > >> > > ../../gcc/gcc/passes.cc:2091
> > > > > > > > >> > > 0xe93ccb execute_todo
> > > > > > > > >> > > ../../gcc/gcc/passes.cc:2145
> > > > > > > > >> > >
> > > > > > > > >> > > This happens because, after folding svld1rq_s32 to 
> > > > > > > > >> > > vec_perm_expr, we have:
> > > > > > > > >> > >   int32x4_t v;
> > > > > > > > >> > >   __Int32x4_t _1;
> > > > > > > > >> > >   svint32_t _9;
> > > > > > > > >> > >   vector(4) int _11;
> > > > > > > > >> > >
> > > > > > > > >> > >:
> > > > > > > > >> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> > > > > > > > >> > >   v_12 = _1;
> > > > > > > > >> > >   _11 = v_12;
> > > > > > > > >> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> > > > > > > > >> > >   return _9;
> > > > > > > > >> > >
> > > > > > > > >> > > During forwprop, simplify_permutation simplifies 
> > > > > > > > >> > > vec_perm_expr to
> > > > > > > > >> > > view_convert_expr,
> > > > > > > > >> > > and the end result becomes:
> > > > > > > > >> > >   svint32_t _7;
> > > > > > > > >> > >   __Int32x4_t _8;
> > > > > > > > >> > >
> > > > > > > > >> > > ;;   basic block 2, loop depth 0
> > > > > > > > >> > > ;;pred:   ENTRY
> > > > > > > > >> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> > > > > > > > >> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > > > > > >> > >   return _7;
> > > > > > > > >> > > ;;succ:   EXIT
> > > > > > > > >> > >
> > > > > > > > >> > > which causes the error duing verify_gimple since 
> > > > > > > > >> > > VIEW_CONVERT_EXPR
> > > > > > > > >> > > has incompatible types (svint32_t, int32x4_t).
> > > > > > > > >> > >
> > > > > > > > >> > > The attached patch disables simplification of 
> > > > > > > > >> > > VEC_PERM_EXPR
> > > > > > > > >> > > in simplify_permutation, if lhs and rhs have non 
> > > > > > > > >> > > compatible types,
> > > > > > > > >> > > which resolves ICE, but am not sure if it's the correct 
> > > > > > > > >> > > approach ?
> > > > > > > > >> >
> > > > > > > > >> > It for sure papers over the issue.  I think the error 
> > > > > > > > >> > happens earlier,
> > > > > > > > >> > the V_C_E should have been built with the type of the 
> > > > > > > > >> > VEC_PERM_EXPR
> > > > > > > > >> > which is the type of the LHS.  But then you probably run 
> > > > > > > > >> > into the
> > > > > > > > >> > different sizes ICE (VLA vs constant size).  I think for 
> > > > > > > > >> > this case you
> > > > > > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> > > > > > > > >> > selecting the "low" part of the VLA vector.
> > > > > > > > >> Hi Richard,
> > > > > > > > >> Sorry I don't quite 

Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-08-09 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 8 Aug 2022 at 14:27, Richard Biener  wrote:
>
> On Mon, Aug 1, 2022 at 5:17 AM Prathamesh Kulkarni
>  wrote:
> >
> > On Thu, 21 Jul 2022 at 12:21, Richard Biener  
> > wrote:
> > >
> > > On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Mon, 18 Jul 2022 at 11:57, Richard Biener 
> > > >  wrote:
> > > > >
> > > > > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni
> > > > >  wrote:
> > > > > >
> > > > > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford
> > > > > >  wrote:
> > > > > > >
> > > > > > > Richard Biener  writes:
> > > > > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
> > > > > > > >  wrote:
> > > > > > > >>
> > > > > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener 
> > > > > > > >>  wrote:
> > > > > > > >> >
> > > > > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via 
> > > > > > > >> > Gcc-patches
> > > > > > > >> >  wrote:
> > > > > > > >> > >
> > > > > > > >> > > Hi Richard,
> > > > > > > >> > > For the following test:
> > > > > > > >> > >
> > > > > > > >> > > svint32_t f2(int a, int b, int c, int d)
> > > > > > > >> > > {
> > > > > > > >> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
> > > > > > > >> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > > > >> > > }
> > > > > > > >> > >
> > > > > > > >> > > The compiler emits following ICE with -O3 
> > > > > > > >> > > -mcpu=generic+sve:
> > > > > > > >> > > foo.c: In function ‘f2’:
> > > > > > > >> > > foo.c:4:11: error: non-trivial conversion in 
> > > > > > > >> > > ‘view_convert_expr’
> > > > > > > >> > > 4 | svint32_t f2(int a, int b, int c, int d)
> > > > > > > >> > >   |   ^~
> > > > > > > >> > > svint32_t
> > > > > > > >> > > __Int32x4_t
> > > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > > > > >> > > during GIMPLE pass: forwprop
> > > > > > > >> > > dump file: foo.c.109t.forwprop2
> > > > > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed
> > > > > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
> > > > > > > >> > > ../../gcc/gcc/tree-cfg.cc:5568
> > > > > > > >> > > 0xe9371f execute_function_todo
> > > > > > > >> > > ../../gcc/gcc/passes.cc:2091
> > > > > > > >> > > 0xe93ccb execute_todo
> > > > > > > >> > > ../../gcc/gcc/passes.cc:2145
> > > > > > > >> > >
> > > > > > > >> > > This happens because, after folding svld1rq_s32 to 
> > > > > > > >> > > vec_perm_expr, we have:
> > > > > > > >> > >   int32x4_t v;
> > > > > > > >> > >   __Int32x4_t _1;
> > > > > > > >> > >   svint32_t _9;
> > > > > > > >> > >   vector(4) int _11;
> > > > > > > >> > >
> > > > > > > >> > >:
> > > > > > > >> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> > > > > > > >> > >   v_12 = _1;
> > > > > > > >> > >   _11 = v_12;
> > > > > > > >> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> > > > > > > >> > >   return _9;
> > > > > > > >> > >
> > > > > > > >> > > During forwprop, simplify_permutation simplifies 
> > > > > > > >> > > vec_perm_expr to
> > > > > > > >> > > view_convert_expr,
> > > > > > > >> > > and the end result becomes:
> > > > > > > >> > >   svint32_t _7;
> > > > > > > >> > >   __Int32x4_t _8;
> > > > > > > >> > >
> > > > > > > >> > > ;;   basic block 2, loop depth 0
> > > > > > > >> > > ;;pred:   ENTRY
> > > > > > > >> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> > > > > > > >> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > > > > >> > >   return _7;
> > > > > > > >> > > ;;succ:   EXIT
> > > > > > > >> > >
> > > > > > > >> > > which causes the error duing verify_gimple since 
> > > > > > > >> > > VIEW_CONVERT_EXPR
> > > > > > > >> > > has incompatible types (svint32_t, int32x4_t).
> > > > > > > >> > >
> > > > > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR
> > > > > > > >> > > in simplify_permutation, if lhs and rhs have non 
> > > > > > > >> > > compatible types,
> > > > > > > >> > > which resolves ICE, but am not sure if it's the correct 
> > > > > > > >> > > approach ?
> > > > > > > >> >
> > > > > > > >> > It for sure papers over the issue.  I think the error 
> > > > > > > >> > happens earlier,
> > > > > > > >> > the V_C_E should have been built with the type of the 
> > > > > > > >> > VEC_PERM_EXPR
> > > > > > > >> > which is the type of the LHS.  But then you probably run 
> > > > > > > >> > into the
> > > > > > > >> > different sizes ICE (VLA vs constant size).  I think for 
> > > > > > > >> > this case you
> > > > > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> > > > > > > >> > selecting the "low" part of the VLA vector.
> > > > > > > >> Hi Richard,
> > > > > > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR 
> > > > > > > >> to
> > > > > > > >> represent dup operation
> > > > > > > >> from fixed width to VLA vector. I am not sure how folding it to
> > > > > > > >> BIT_FIELD_REF will work.
> > > > > > > >> Could you please elaborate ?
> > > > > > > >>
> > > > > > > >> A

Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-08-08 Thread Richard Biener via Gcc-patches
On Mon, Aug 1, 2022 at 5:17 AM Prathamesh Kulkarni
 wrote:
>
> On Thu, 21 Jul 2022 at 12:21, Richard Biener  
> wrote:
> >
> > On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Mon, 18 Jul 2022 at 11:57, Richard Biener  
> > > wrote:
> > > >
> > > > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni
> > > >  wrote:
> > > > >
> > > > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford
> > > > >  wrote:
> > > > > >
> > > > > > Richard Biener  writes:
> > > > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
> > > > > > >  wrote:
> > > > > > >>
> > > > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener 
> > > > > > >>  wrote:
> > > > > > >> >
> > > > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via 
> > > > > > >> > Gcc-patches
> > > > > > >> >  wrote:
> > > > > > >> > >
> > > > > > >> > > Hi Richard,
> > > > > > >> > > For the following test:
> > > > > > >> > >
> > > > > > >> > > svint32_t f2(int a, int b, int c, int d)
> > > > > > >> > > {
> > > > > > >> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
> > > > > > >> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > > >> > > }
> > > > > > >> > >
> > > > > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve:
> > > > > > >> > > foo.c: In function ‘f2’:
> > > > > > >> > > foo.c:4:11: error: non-trivial conversion in 
> > > > > > >> > > ‘view_convert_expr’
> > > > > > >> > > 4 | svint32_t f2(int a, int b, int c, int d)
> > > > > > >> > >   |   ^~
> > > > > > >> > > svint32_t
> > > > > > >> > > __Int32x4_t
> > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > > > >> > > during GIMPLE pass: forwprop
> > > > > > >> > > dump file: foo.c.109t.forwprop2
> > > > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed
> > > > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
> > > > > > >> > > ../../gcc/gcc/tree-cfg.cc:5568
> > > > > > >> > > 0xe9371f execute_function_todo
> > > > > > >> > > ../../gcc/gcc/passes.cc:2091
> > > > > > >> > > 0xe93ccb execute_todo
> > > > > > >> > > ../../gcc/gcc/passes.cc:2145
> > > > > > >> > >
> > > > > > >> > > This happens because, after folding svld1rq_s32 to 
> > > > > > >> > > vec_perm_expr, we have:
> > > > > > >> > >   int32x4_t v;
> > > > > > >> > >   __Int32x4_t _1;
> > > > > > >> > >   svint32_t _9;
> > > > > > >> > >   vector(4) int _11;
> > > > > > >> > >
> > > > > > >> > >:
> > > > > > >> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> > > > > > >> > >   v_12 = _1;
> > > > > > >> > >   _11 = v_12;
> > > > > > >> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> > > > > > >> > >   return _9;
> > > > > > >> > >
> > > > > > >> > > During forwprop, simplify_permutation simplifies 
> > > > > > >> > > vec_perm_expr to
> > > > > > >> > > view_convert_expr,
> > > > > > >> > > and the end result becomes:
> > > > > > >> > >   svint32_t _7;
> > > > > > >> > >   __Int32x4_t _8;
> > > > > > >> > >
> > > > > > >> > > ;;   basic block 2, loop depth 0
> > > > > > >> > > ;;pred:   ENTRY
> > > > > > >> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> > > > > > >> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > > > >> > >   return _7;
> > > > > > >> > > ;;succ:   EXIT
> > > > > > >> > >
> > > > > > >> > > which causes the error duing verify_gimple since 
> > > > > > >> > > VIEW_CONVERT_EXPR
> > > > > > >> > > has incompatible types (svint32_t, int32x4_t).
> > > > > > >> > >
> > > > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR
> > > > > > >> > > in simplify_permutation, if lhs and rhs have non compatible 
> > > > > > >> > > types,
> > > > > > >> > > which resolves ICE, but am not sure if it's the correct 
> > > > > > >> > > approach ?
> > > > > > >> >
> > > > > > >> > It for sure papers over the issue.  I think the error happens 
> > > > > > >> > earlier,
> > > > > > >> > the V_C_E should have been built with the type of the 
> > > > > > >> > VEC_PERM_EXPR
> > > > > > >> > which is the type of the LHS.  But then you probably run into 
> > > > > > >> > the
> > > > > > >> > different sizes ICE (VLA vs constant size).  I think for this 
> > > > > > >> > case you
> > > > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> > > > > > >> > selecting the "low" part of the VLA vector.
> > > > > > >> Hi Richard,
> > > > > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
> > > > > > >> represent dup operation
> > > > > > >> from fixed width to VLA vector. I am not sure how folding it to
> > > > > > >> BIT_FIELD_REF will work.
> > > > > > >> Could you please elaborate ?
> > > > > > >>
> > > > > > >> Also, the issue doesn't seem restricted to this case.
> > > > > > >> The following test case also ICE's during forwprop:
> > > > > > >> svint32_t foo()
> > > > > > >> {
> > > > > > >>   int32x4_t v = (int32x4_t) {1, 2, 3, 4};
> > > > > > >>   svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > > >>   return v2;

Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-07-31 Thread Prathamesh Kulkarni via Gcc-patches
On Thu, 21 Jul 2022 at 12:21, Richard Biener  wrote:
>
> On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni
>  wrote:
> >
> > On Mon, 18 Jul 2022 at 11:57, Richard Biener  
> > wrote:
> > >
> > > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford
> > > >  wrote:
> > > > >
> > > > > Richard Biener  writes:
> > > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
> > > > > >  wrote:
> > > > > >>
> > > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener 
> > > > > >>  wrote:
> > > > > >> >
> > > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via 
> > > > > >> > Gcc-patches
> > > > > >> >  wrote:
> > > > > >> > >
> > > > > >> > > Hi Richard,
> > > > > >> > > For the following test:
> > > > > >> > >
> > > > > >> > > svint32_t f2(int a, int b, int c, int d)
> > > > > >> > > {
> > > > > >> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
> > > > > >> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > >> > > }
> > > > > >> > >
> > > > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve:
> > > > > >> > > foo.c: In function ‘f2’:
> > > > > >> > > foo.c:4:11: error: non-trivial conversion in 
> > > > > >> > > ‘view_convert_expr’
> > > > > >> > > 4 | svint32_t f2(int a, int b, int c, int d)
> > > > > >> > >   |   ^~
> > > > > >> > > svint32_t
> > > > > >> > > __Int32x4_t
> > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > > >> > > during GIMPLE pass: forwprop
> > > > > >> > > dump file: foo.c.109t.forwprop2
> > > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed
> > > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
> > > > > >> > > ../../gcc/gcc/tree-cfg.cc:5568
> > > > > >> > > 0xe9371f execute_function_todo
> > > > > >> > > ../../gcc/gcc/passes.cc:2091
> > > > > >> > > 0xe93ccb execute_todo
> > > > > >> > > ../../gcc/gcc/passes.cc:2145
> > > > > >> > >
> > > > > >> > > This happens because, after folding svld1rq_s32 to 
> > > > > >> > > vec_perm_expr, we have:
> > > > > >> > >   int32x4_t v;
> > > > > >> > >   __Int32x4_t _1;
> > > > > >> > >   svint32_t _9;
> > > > > >> > >   vector(4) int _11;
> > > > > >> > >
> > > > > >> > >:
> > > > > >> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> > > > > >> > >   v_12 = _1;
> > > > > >> > >   _11 = v_12;
> > > > > >> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> > > > > >> > >   return _9;
> > > > > >> > >
> > > > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr 
> > > > > >> > > to
> > > > > >> > > view_convert_expr,
> > > > > >> > > and the end result becomes:
> > > > > >> > >   svint32_t _7;
> > > > > >> > >   __Int32x4_t _8;
> > > > > >> > >
> > > > > >> > > ;;   basic block 2, loop depth 0
> > > > > >> > > ;;pred:   ENTRY
> > > > > >> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> > > > > >> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > > >> > >   return _7;
> > > > > >> > > ;;succ:   EXIT
> > > > > >> > >
> > > > > >> > > which causes the error duing verify_gimple since 
> > > > > >> > > VIEW_CONVERT_EXPR
> > > > > >> > > has incompatible types (svint32_t, int32x4_t).
> > > > > >> > >
> > > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR
> > > > > >> > > in simplify_permutation, if lhs and rhs have non compatible 
> > > > > >> > > types,
> > > > > >> > > which resolves ICE, but am not sure if it's the correct 
> > > > > >> > > approach ?
> > > > > >> >
> > > > > >> > It for sure papers over the issue.  I think the error happens 
> > > > > >> > earlier,
> > > > > >> > the V_C_E should have been built with the type of the 
> > > > > >> > VEC_PERM_EXPR
> > > > > >> > which is the type of the LHS.  But then you probably run into the
> > > > > >> > different sizes ICE (VLA vs constant size).  I think for this 
> > > > > >> > case you
> > > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> > > > > >> > selecting the "low" part of the VLA vector.
> > > > > >> Hi Richard,
> > > > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
> > > > > >> represent dup operation
> > > > > >> from fixed width to VLA vector. I am not sure how folding it to
> > > > > >> BIT_FIELD_REF will work.
> > > > > >> Could you please elaborate ?
> > > > > >>
> > > > > >> Also, the issue doesn't seem restricted to this case.
> > > > > >> The following test case also ICE's during forwprop:
> > > > > >> svint32_t foo()
> > > > > >> {
> > > > > >>   int32x4_t v = (int32x4_t) {1, 2, 3, 4};
> > > > > >>   svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > >>   return v2;
> > > > > >> }
> > > > > >>
> > > > > >> foo2.c: In function ‘foo’:
> > > > > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’
> > > > > >> 9 | }
> > > > > >>   | ^
> > > > > >> svint32_t
> > > > > >> int32x4_t
> > > > > >> v2_4 = { 1, 2, 3, 4 };
> > > > > >>
> > > > > >> because simplify

Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-07-20 Thread Richard Biener via Gcc-patches
On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni
 wrote:
>
> On Mon, 18 Jul 2022 at 11:57, Richard Biener  
> wrote:
> >
> > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford
> > >  wrote:
> > > >
> > > > Richard Biener  writes:
> > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
> > > > >  wrote:
> > > > >>
> > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener 
> > > > >>  wrote:
> > > > >> >
> > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
> > > > >> >  wrote:
> > > > >> > >
> > > > >> > > Hi Richard,
> > > > >> > > For the following test:
> > > > >> > >
> > > > >> > > svint32_t f2(int a, int b, int c, int d)
> > > > >> > > {
> > > > >> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
> > > > >> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > >> > > }
> > > > >> > >
> > > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve:
> > > > >> > > foo.c: In function ‘f2’:
> > > > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
> > > > >> > > 4 | svint32_t f2(int a, int b, int c, int d)
> > > > >> > >   |   ^~
> > > > >> > > svint32_t
> > > > >> > > __Int32x4_t
> > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > >> > > during GIMPLE pass: forwprop
> > > > >> > > dump file: foo.c.109t.forwprop2
> > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed
> > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
> > > > >> > > ../../gcc/gcc/tree-cfg.cc:5568
> > > > >> > > 0xe9371f execute_function_todo
> > > > >> > > ../../gcc/gcc/passes.cc:2091
> > > > >> > > 0xe93ccb execute_todo
> > > > >> > > ../../gcc/gcc/passes.cc:2145
> > > > >> > >
> > > > >> > > This happens because, after folding svld1rq_s32 to 
> > > > >> > > vec_perm_expr, we have:
> > > > >> > >   int32x4_t v;
> > > > >> > >   __Int32x4_t _1;
> > > > >> > >   svint32_t _9;
> > > > >> > >   vector(4) int _11;
> > > > >> > >
> > > > >> > >:
> > > > >> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> > > > >> > >   v_12 = _1;
> > > > >> > >   _11 = v_12;
> > > > >> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> > > > >> > >   return _9;
> > > > >> > >
> > > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to
> > > > >> > > view_convert_expr,
> > > > >> > > and the end result becomes:
> > > > >> > >   svint32_t _7;
> > > > >> > >   __Int32x4_t _8;
> > > > >> > >
> > > > >> > > ;;   basic block 2, loop depth 0
> > > > >> > > ;;pred:   ENTRY
> > > > >> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> > > > >> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > >> > >   return _7;
> > > > >> > > ;;succ:   EXIT
> > > > >> > >
> > > > >> > > which causes the error duing verify_gimple since 
> > > > >> > > VIEW_CONVERT_EXPR
> > > > >> > > has incompatible types (svint32_t, int32x4_t).
> > > > >> > >
> > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR
> > > > >> > > in simplify_permutation, if lhs and rhs have non compatible 
> > > > >> > > types,
> > > > >> > > which resolves ICE, but am not sure if it's the correct approach 
> > > > >> > > ?
> > > > >> >
> > > > >> > It for sure papers over the issue.  I think the error happens 
> > > > >> > earlier,
> > > > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR
> > > > >> > which is the type of the LHS.  But then you probably run into the
> > > > >> > different sizes ICE (VLA vs constant size).  I think for this case 
> > > > >> > you
> > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> > > > >> > selecting the "low" part of the VLA vector.
> > > > >> Hi Richard,
> > > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
> > > > >> represent dup operation
> > > > >> from fixed width to VLA vector. I am not sure how folding it to
> > > > >> BIT_FIELD_REF will work.
> > > > >> Could you please elaborate ?
> > > > >>
> > > > >> Also, the issue doesn't seem restricted to this case.
> > > > >> The following test case also ICE's during forwprop:
> > > > >> svint32_t foo()
> > > > >> {
> > > > >>   int32x4_t v = (int32x4_t) {1, 2, 3, 4};
> > > > >>   svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > >>   return v2;
> > > > >> }
> > > > >>
> > > > >> foo2.c: In function ‘foo’:
> > > > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’
> > > > >> 9 | }
> > > > >>   | ^
> > > > >> svint32_t
> > > > >> int32x4_t
> > > > >> v2_4 = { 1, 2, 3, 4 };
> > > > >>
> > > > >> because simplify_permutation folds
> > > > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} >
> > > > >> into:
> > > > >> vector_cst {1, 2, 3, 4}
> > > > >>
> > > > >> and it complains during verify_gimple_assign_single because we don't
> > > > >> support assignment of vector_cst to VLA vector.
> > > > >>
> > > > >> I guess the issue really is that currently, 

Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-07-20 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 18 Jul 2022 at 11:57, Richard Biener  wrote:
>
> On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni
>  wrote:
> >
> > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford
> >  wrote:
> > >
> > > Richard Biener  writes:
> > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
> > > >  wrote:
> > > >>
> > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener 
> > > >>  wrote:
> > > >> >
> > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
> > > >> >  wrote:
> > > >> > >
> > > >> > > Hi Richard,
> > > >> > > For the following test:
> > > >> > >
> > > >> > > svint32_t f2(int a, int b, int c, int d)
> > > >> > > {
> > > >> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
> > > >> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > >> > > }
> > > >> > >
> > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve:
> > > >> > > foo.c: In function ‘f2’:
> > > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
> > > >> > > 4 | svint32_t f2(int a, int b, int c, int d)
> > > >> > >   |   ^~
> > > >> > > svint32_t
> > > >> > > __Int32x4_t
> > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > >> > > during GIMPLE pass: forwprop
> > > >> > > dump file: foo.c.109t.forwprop2
> > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed
> > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
> > > >> > > ../../gcc/gcc/tree-cfg.cc:5568
> > > >> > > 0xe9371f execute_function_todo
> > > >> > > ../../gcc/gcc/passes.cc:2091
> > > >> > > 0xe93ccb execute_todo
> > > >> > > ../../gcc/gcc/passes.cc:2145
> > > >> > >
> > > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, 
> > > >> > > we have:
> > > >> > >   int32x4_t v;
> > > >> > >   __Int32x4_t _1;
> > > >> > >   svint32_t _9;
> > > >> > >   vector(4) int _11;
> > > >> > >
> > > >> > >:
> > > >> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> > > >> > >   v_12 = _1;
> > > >> > >   _11 = v_12;
> > > >> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> > > >> > >   return _9;
> > > >> > >
> > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to
> > > >> > > view_convert_expr,
> > > >> > > and the end result becomes:
> > > >> > >   svint32_t _7;
> > > >> > >   __Int32x4_t _8;
> > > >> > >
> > > >> > > ;;   basic block 2, loop depth 0
> > > >> > > ;;pred:   ENTRY
> > > >> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> > > >> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > >> > >   return _7;
> > > >> > > ;;succ:   EXIT
> > > >> > >
> > > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
> > > >> > > has incompatible types (svint32_t, int32x4_t).
> > > >> > >
> > > >> > > The attached patch disables simplification of VEC_PERM_EXPR
> > > >> > > in simplify_permutation, if lhs and rhs have non compatible types,
> > > >> > > which resolves ICE, but am not sure if it's the correct approach ?
> > > >> >
> > > >> > It for sure papers over the issue.  I think the error happens 
> > > >> > earlier,
> > > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR
> > > >> > which is the type of the LHS.  But then you probably run into the
> > > >> > different sizes ICE (VLA vs constant size).  I think for this case 
> > > >> > you
> > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> > > >> > selecting the "low" part of the VLA vector.
> > > >> Hi Richard,
> > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
> > > >> represent dup operation
> > > >> from fixed width to VLA vector. I am not sure how folding it to
> > > >> BIT_FIELD_REF will work.
> > > >> Could you please elaborate ?
> > > >>
> > > >> Also, the issue doesn't seem restricted to this case.
> > > >> The following test case also ICE's during forwprop:
> > > >> svint32_t foo()
> > > >> {
> > > >>   int32x4_t v = (int32x4_t) {1, 2, 3, 4};
> > > >>   svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > >>   return v2;
> > > >> }
> > > >>
> > > >> foo2.c: In function ‘foo’:
> > > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’
> > > >> 9 | }
> > > >>   | ^
> > > >> svint32_t
> > > >> int32x4_t
> > > >> v2_4 = { 1, 2, 3, 4 };
> > > >>
> > > >> because simplify_permutation folds
> > > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} >
> > > >> into:
> > > >> vector_cst {1, 2, 3, 4}
> > > >>
> > > >> and it complains during verify_gimple_assign_single because we don't
> > > >> support assignment of vector_cst to VLA vector.
> > > >>
> > > >> I guess the issue really is that currently, only VEC_PERM_EXPR
> > > >> supports lhs and rhs
> > > >> to have vector types with differing lengths, and simplifying it to
> > > >> other tree codes, like above,
> > > >> will result in type errors ?
> > > >
> > > > That might be the case - Richard should know.
> > >
> > > I don't see anything particularly special about VEC_PERM_EXPR here,
> > > or 

Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-07-17 Thread Richard Biener via Gcc-patches
On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni
 wrote:
>
> On Thu, 14 Jul 2022 at 17:22, Richard Sandiford
>  wrote:
> >
> > Richard Biener  writes:
> > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
> > >  wrote:
> > >>
> > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener 
> > >>  wrote:
> > >> >
> > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
> > >> >  wrote:
> > >> > >
> > >> > > Hi Richard,
> > >> > > For the following test:
> > >> > >
> > >> > > svint32_t f2(int a, int b, int c, int d)
> > >> > > {
> > >> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
> > >> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > >> > > }
> > >> > >
> > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve:
> > >> > > foo.c: In function ‘f2’:
> > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
> > >> > > 4 | svint32_t f2(int a, int b, int c, int d)
> > >> > >   |   ^~
> > >> > > svint32_t
> > >> > > __Int32x4_t
> > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > >> > > during GIMPLE pass: forwprop
> > >> > > dump file: foo.c.109t.forwprop2
> > >> > > foo.c:4:11: internal compiler error: verify_gimple failed
> > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
> > >> > > ../../gcc/gcc/tree-cfg.cc:5568
> > >> > > 0xe9371f execute_function_todo
> > >> > > ../../gcc/gcc/passes.cc:2091
> > >> > > 0xe93ccb execute_todo
> > >> > > ../../gcc/gcc/passes.cc:2145
> > >> > >
> > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we 
> > >> > > have:
> > >> > >   int32x4_t v;
> > >> > >   __Int32x4_t _1;
> > >> > >   svint32_t _9;
> > >> > >   vector(4) int _11;
> > >> > >
> > >> > >:
> > >> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> > >> > >   v_12 = _1;
> > >> > >   _11 = v_12;
> > >> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> > >> > >   return _9;
> > >> > >
> > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to
> > >> > > view_convert_expr,
> > >> > > and the end result becomes:
> > >> > >   svint32_t _7;
> > >> > >   __Int32x4_t _8;
> > >> > >
> > >> > > ;;   basic block 2, loop depth 0
> > >> > > ;;pred:   ENTRY
> > >> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> > >> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > >> > >   return _7;
> > >> > > ;;succ:   EXIT
> > >> > >
> > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
> > >> > > has incompatible types (svint32_t, int32x4_t).
> > >> > >
> > >> > > The attached patch disables simplification of VEC_PERM_EXPR
> > >> > > in simplify_permutation, if lhs and rhs have non compatible types,
> > >> > > which resolves ICE, but am not sure if it's the correct approach ?
> > >> >
> > >> > It for sure papers over the issue.  I think the error happens earlier,
> > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR
> > >> > which is the type of the LHS.  But then you probably run into the
> > >> > different sizes ICE (VLA vs constant size).  I think for this case you
> > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> > >> > selecting the "low" part of the VLA vector.
> > >> Hi Richard,
> > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
> > >> represent dup operation
> > >> from fixed width to VLA vector. I am not sure how folding it to
> > >> BIT_FIELD_REF will work.
> > >> Could you please elaborate ?
> > >>
> > >> Also, the issue doesn't seem restricted to this case.
> > >> The following test case also ICE's during forwprop:
> > >> svint32_t foo()
> > >> {
> > >>   int32x4_t v = (int32x4_t) {1, 2, 3, 4};
> > >>   svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]);
> > >>   return v2;
> > >> }
> > >>
> > >> foo2.c: In function ‘foo’:
> > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’
> > >> 9 | }
> > >>   | ^
> > >> svint32_t
> > >> int32x4_t
> > >> v2_4 = { 1, 2, 3, 4 };
> > >>
> > >> because simplify_permutation folds
> > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} >
> > >> into:
> > >> vector_cst {1, 2, 3, 4}
> > >>
> > >> and it complains during verify_gimple_assign_single because we don't
> > >> support assignment of vector_cst to VLA vector.
> > >>
> > >> I guess the issue really is that currently, only VEC_PERM_EXPR
> > >> supports lhs and rhs
> > >> to have vector types with differing lengths, and simplifying it to
> > >> other tree codes, like above,
> > >> will result in type errors ?
> > >
> > > That might be the case - Richard should know.
> >
> > I don't see anything particularly special about VEC_PERM_EXPR here,
> > or about the VLA vs. VLS thing.  We would have the same issue trying
> > to build a 128-bit vector from 2 64-bit vectors.  And there are other
> > tree codes whose input types are/can be different from their output
> > types.
> >
> > So it just seems like a normal type correctness issue: a VEC_PERM_EXPR
> > of type T needs to be r

Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-07-15 Thread Prathamesh Kulkarni via Gcc-patches
On Thu, 14 Jul 2022 at 17:22, Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
> >  wrote:
> >>
> >> On Wed, 13 Jul 2022 at 12:22, Richard Biener  
> >> wrote:
> >> >
> >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
> >> >  wrote:
> >> > >
> >> > > Hi Richard,
> >> > > For the following test:
> >> > >
> >> > > svint32_t f2(int a, int b, int c, int d)
> >> > > {
> >> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
> >> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> >> > > }
> >> > >
> >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve:
> >> > > foo.c: In function ‘f2’:
> >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
> >> > > 4 | svint32_t f2(int a, int b, int c, int d)
> >> > >   |   ^~
> >> > > svint32_t
> >> > > __Int32x4_t
> >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> >> > > during GIMPLE pass: forwprop
> >> > > dump file: foo.c.109t.forwprop2
> >> > > foo.c:4:11: internal compiler error: verify_gimple failed
> >> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
> >> > > ../../gcc/gcc/tree-cfg.cc:5568
> >> > > 0xe9371f execute_function_todo
> >> > > ../../gcc/gcc/passes.cc:2091
> >> > > 0xe93ccb execute_todo
> >> > > ../../gcc/gcc/passes.cc:2145
> >> > >
> >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we 
> >> > > have:
> >> > >   int32x4_t v;
> >> > >   __Int32x4_t _1;
> >> > >   svint32_t _9;
> >> > >   vector(4) int _11;
> >> > >
> >> > >:
> >> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> >> > >   v_12 = _1;
> >> > >   _11 = v_12;
> >> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> >> > >   return _9;
> >> > >
> >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to
> >> > > view_convert_expr,
> >> > > and the end result becomes:
> >> > >   svint32_t _7;
> >> > >   __Int32x4_t _8;
> >> > >
> >> > > ;;   basic block 2, loop depth 0
> >> > > ;;pred:   ENTRY
> >> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> >> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> >> > >   return _7;
> >> > > ;;succ:   EXIT
> >> > >
> >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
> >> > > has incompatible types (svint32_t, int32x4_t).
> >> > >
> >> > > The attached patch disables simplification of VEC_PERM_EXPR
> >> > > in simplify_permutation, if lhs and rhs have non compatible types,
> >> > > which resolves ICE, but am not sure if it's the correct approach ?
> >> >
> >> > It for sure papers over the issue.  I think the error happens earlier,
> >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR
> >> > which is the type of the LHS.  But then you probably run into the
> >> > different sizes ICE (VLA vs constant size).  I think for this case you
> >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> >> > selecting the "low" part of the VLA vector.
> >> Hi Richard,
> >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
> >> represent dup operation
> >> from fixed width to VLA vector. I am not sure how folding it to
> >> BIT_FIELD_REF will work.
> >> Could you please elaborate ?
> >>
> >> Also, the issue doesn't seem restricted to this case.
> >> The following test case also ICE's during forwprop:
> >> svint32_t foo()
> >> {
> >>   int32x4_t v = (int32x4_t) {1, 2, 3, 4};
> >>   svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]);
> >>   return v2;
> >> }
> >>
> >> foo2.c: In function ‘foo’:
> >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’
> >> 9 | }
> >>   | ^
> >> svint32_t
> >> int32x4_t
> >> v2_4 = { 1, 2, 3, 4 };
> >>
> >> because simplify_permutation folds
> >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} >
> >> into:
> >> vector_cst {1, 2, 3, 4}
> >>
> >> and it complains during verify_gimple_assign_single because we don't
> >> support assignment of vector_cst to VLA vector.
> >>
> >> I guess the issue really is that currently, only VEC_PERM_EXPR
> >> supports lhs and rhs
> >> to have vector types with differing lengths, and simplifying it to
> >> other tree codes, like above,
> >> will result in type errors ?
> >
> > That might be the case - Richard should know.
>
> I don't see anything particularly special about VEC_PERM_EXPR here,
> or about the VLA vs. VLS thing.  We would have the same issue trying
> to build a 128-bit vector from 2 64-bit vectors.  And there are other
> tree codes whose input types are/can be different from their output
> types.
>
> So it just seems like a normal type correctness issue: a VEC_PERM_EXPR
> of type T needs to be replaced by something of type T.  Whether T has a
> constant size or a variable size doesn't matter.
>
> > If so your type check
> > is still too late, you should instead recognize that we are permuting
> > a VLA vector and then refuse to go any of the non-VEC_PERM generating
> > paths - that probably means just allow

Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-07-14 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
>  wrote:
>>
>> On Wed, 13 Jul 2022 at 12:22, Richard Biener  
>> wrote:
>> >
>> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
>> >  wrote:
>> > >
>> > > Hi Richard,
>> > > For the following test:
>> > >
>> > > svint32_t f2(int a, int b, int c, int d)
>> > > {
>> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
>> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
>> > > }
>> > >
>> > > The compiler emits following ICE with -O3 -mcpu=generic+sve:
>> > > foo.c: In function ‘f2’:
>> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
>> > > 4 | svint32_t f2(int a, int b, int c, int d)
>> > >   |   ^~
>> > > svint32_t
>> > > __Int32x4_t
>> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
>> > > during GIMPLE pass: forwprop
>> > > dump file: foo.c.109t.forwprop2
>> > > foo.c:4:11: internal compiler error: verify_gimple failed
>> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
>> > > ../../gcc/gcc/tree-cfg.cc:5568
>> > > 0xe9371f execute_function_todo
>> > > ../../gcc/gcc/passes.cc:2091
>> > > 0xe93ccb execute_todo
>> > > ../../gcc/gcc/passes.cc:2145
>> > >
>> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we 
>> > > have:
>> > >   int32x4_t v;
>> > >   __Int32x4_t _1;
>> > >   svint32_t _9;
>> > >   vector(4) int _11;
>> > >
>> > >:
>> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
>> > >   v_12 = _1;
>> > >   _11 = v_12;
>> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
>> > >   return _9;
>> > >
>> > > During forwprop, simplify_permutation simplifies vec_perm_expr to
>> > > view_convert_expr,
>> > > and the end result becomes:
>> > >   svint32_t _7;
>> > >   __Int32x4_t _8;
>> > >
>> > > ;;   basic block 2, loop depth 0
>> > > ;;pred:   ENTRY
>> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
>> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
>> > >   return _7;
>> > > ;;succ:   EXIT
>> > >
>> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
>> > > has incompatible types (svint32_t, int32x4_t).
>> > >
>> > > The attached patch disables simplification of VEC_PERM_EXPR
>> > > in simplify_permutation, if lhs and rhs have non compatible types,
>> > > which resolves ICE, but am not sure if it's the correct approach ?
>> >
>> > It for sure papers over the issue.  I think the error happens earlier,
>> > the V_C_E should have been built with the type of the VEC_PERM_EXPR
>> > which is the type of the LHS.  But then you probably run into the
>> > different sizes ICE (VLA vs constant size).  I think for this case you
>> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
>> > selecting the "low" part of the VLA vector.
>> Hi Richard,
>> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
>> represent dup operation
>> from fixed width to VLA vector. I am not sure how folding it to
>> BIT_FIELD_REF will work.
>> Could you please elaborate ?
>>
>> Also, the issue doesn't seem restricted to this case.
>> The following test case also ICE's during forwprop:
>> svint32_t foo()
>> {
>>   int32x4_t v = (int32x4_t) {1, 2, 3, 4};
>>   svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]);
>>   return v2;
>> }
>>
>> foo2.c: In function ‘foo’:
>> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’
>> 9 | }
>>   | ^
>> svint32_t
>> int32x4_t
>> v2_4 = { 1, 2, 3, 4 };
>>
>> because simplify_permutation folds
>> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} >
>> into:
>> vector_cst {1, 2, 3, 4}
>>
>> and it complains during verify_gimple_assign_single because we don't
>> support assignment of vector_cst to VLA vector.
>>
>> I guess the issue really is that currently, only VEC_PERM_EXPR
>> supports lhs and rhs
>> to have vector types with differing lengths, and simplifying it to
>> other tree codes, like above,
>> will result in type errors ?
>
> That might be the case - Richard should know.

I don't see anything particularly special about VEC_PERM_EXPR here,
or about the VLA vs. VLS thing.  We would have the same issue trying
to build a 128-bit vector from 2 64-bit vectors.  And there are other
tree codes whose input types are/can be different from their output
types.

So it just seems like a normal type correctness issue: a VEC_PERM_EXPR
of type T needs to be replaced by something of type T.  Whether T has a
constant size or a variable size doesn't matter.

> If so your type check
> is still too late, you should instead recognize that we are permuting
> a VLA vector and then refuse to go any of the non-VEC_PERM generating
> paths - that probably means just allowing the code == VEC_PERM_EXPR
> case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases?

Yeah.  If we're talking about the match.pd code, I think only:

  (if (sel.series_p (0, 1, 0, 1))
   { op0; }
   (if (sel.series_p (0, 1, nelts, 1))
{ op1; }

need a type compatibility check.  For fold_vec_perm I think
we 

Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-07-14 Thread Richard Biener via Gcc-patches
On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
 wrote:
>
> On Wed, 13 Jul 2022 at 12:22, Richard Biener  
> wrote:
> >
> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
> >  wrote:
> > >
> > > Hi Richard,
> > > For the following test:
> > >
> > > svint32_t f2(int a, int b, int c, int d)
> > > {
> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > }
> > >
> > > The compiler emits following ICE with -O3 -mcpu=generic+sve:
> > > foo.c: In function ‘f2’:
> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
> > > 4 | svint32_t f2(int a, int b, int c, int d)
> > >   |   ^~
> > > svint32_t
> > > __Int32x4_t
> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > during GIMPLE pass: forwprop
> > > dump file: foo.c.109t.forwprop2
> > > foo.c:4:11: internal compiler error: verify_gimple failed
> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
> > > ../../gcc/gcc/tree-cfg.cc:5568
> > > 0xe9371f execute_function_todo
> > > ../../gcc/gcc/passes.cc:2091
> > > 0xe93ccb execute_todo
> > > ../../gcc/gcc/passes.cc:2145
> > >
> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have:
> > >   int32x4_t v;
> > >   __Int32x4_t _1;
> > >   svint32_t _9;
> > >   vector(4) int _11;
> > >
> > >:
> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> > >   v_12 = _1;
> > >   _11 = v_12;
> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> > >   return _9;
> > >
> > > During forwprop, simplify_permutation simplifies vec_perm_expr to
> > > view_convert_expr,
> > > and the end result becomes:
> > >   svint32_t _7;
> > >   __Int32x4_t _8;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;pred:   ENTRY
> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > >   return _7;
> > > ;;succ:   EXIT
> > >
> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
> > > has incompatible types (svint32_t, int32x4_t).
> > >
> > > The attached patch disables simplification of VEC_PERM_EXPR
> > > in simplify_permutation, if lhs and rhs have non compatible types,
> > > which resolves ICE, but am not sure if it's the correct approach ?
> >
> > It for sure papers over the issue.  I think the error happens earlier,
> > the V_C_E should have been built with the type of the VEC_PERM_EXPR
> > which is the type of the LHS.  But then you probably run into the
> > different sizes ICE (VLA vs constant size).  I think for this case you
> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> > selecting the "low" part of the VLA vector.
> Hi Richard,
> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
> represent dup operation
> from fixed width to VLA vector. I am not sure how folding it to
> BIT_FIELD_REF will work.
> Could you please elaborate ?
>
> Also, the issue doesn't seem restricted to this case.
> The following test case also ICE's during forwprop:
> svint32_t foo()
> {
>   int32x4_t v = (int32x4_t) {1, 2, 3, 4};
>   svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]);
>   return v2;
> }
>
> foo2.c: In function ‘foo’:
> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’
> 9 | }
>   | ^
> svint32_t
> int32x4_t
> v2_4 = { 1, 2, 3, 4 };
>
> because simplify_permutation folds
> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} >
> into:
> vector_cst {1, 2, 3, 4}
>
> and it complains during verify_gimple_assign_single because we don't
> support assignment of vector_cst to VLA vector.
>
> I guess the issue really is that currently, only VEC_PERM_EXPR
> supports lhs and rhs
> to have vector types with differing lengths, and simplifying it to
> other tree codes, like above,
> will result in type errors ?

That might be the case - Richard should know.  If so your type check
is still too late, you should instead recognize that we are permuting
a VLA vector and then refuse to go any of the non-VEC_PERM generating
paths - that probably means just allowing the code == VEC_PERM_EXPR
case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases?

Richard.

>
> Thanks,
> Prathamesh
> >
> > >
> > > Alternatively, should we allow assignments from fixed-width to SVE
> > > vector, so the above
> > > VIEW_CONVERT_EXPR would result in dup ?
> > >
> > > Thanks,
> > > Prathamesh


Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-07-14 Thread Prathamesh Kulkarni via Gcc-patches
On Wed, 13 Jul 2022 at 12:22, Richard Biener  wrote:
>
> On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
>  wrote:
> >
> > Hi Richard,
> > For the following test:
> >
> > svint32_t f2(int a, int b, int c, int d)
> > {
> >   int32x4_t v = (int32x4_t) {a, b, c, d};
> >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > }
> >
> > The compiler emits following ICE with -O3 -mcpu=generic+sve:
> > foo.c: In function ‘f2’:
> > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
> > 4 | svint32_t f2(int a, int b, int c, int d)
> >   |   ^~
> > svint32_t
> > __Int32x4_t
> > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > during GIMPLE pass: forwprop
> > dump file: foo.c.109t.forwprop2
> > foo.c:4:11: internal compiler error: verify_gimple failed
> > 0xfda04a verify_gimple_in_cfg(function*, bool)
> > ../../gcc/gcc/tree-cfg.cc:5568
> > 0xe9371f execute_function_todo
> > ../../gcc/gcc/passes.cc:2091
> > 0xe93ccb execute_todo
> > ../../gcc/gcc/passes.cc:2145
> >
> > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have:
> >   int32x4_t v;
> >   __Int32x4_t _1;
> >   svint32_t _9;
> >   vector(4) int _11;
> >
> >:
> >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> >   v_12 = _1;
> >   _11 = v_12;
> >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> >   return _9;
> >
> > During forwprop, simplify_permutation simplifies vec_perm_expr to
> > view_convert_expr,
> > and the end result becomes:
> >   svint32_t _7;
> >   __Int32x4_t _8;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> >   return _7;
> > ;;succ:   EXIT
> >
> > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
> > has incompatible types (svint32_t, int32x4_t).
> >
> > The attached patch disables simplification of VEC_PERM_EXPR
> > in simplify_permutation, if lhs and rhs have non compatible types,
> > which resolves ICE, but am not sure if it's the correct approach ?
>
> It for sure papers over the issue.  I think the error happens earlier,
> the V_C_E should have been built with the type of the VEC_PERM_EXPR
> which is the type of the LHS.  But then you probably run into the
> different sizes ICE (VLA vs constant size).  I think for this case you
> want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> selecting the "low" part of the VLA vector.
Hi Richard,
Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
represent dup operation
from fixed width to VLA vector. I am not sure how folding it to
BIT_FIELD_REF will work.
Could you please elaborate ?

Also, the issue doesn't seem restricted to this case.
The following test case also ICE's during forwprop:
svint32_t foo()
{
  int32x4_t v = (int32x4_t) {1, 2, 3, 4};
  svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]);
  return v2;
}

foo2.c: In function ‘foo’:
foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’
9 | }
  | ^
svint32_t
int32x4_t
v2_4 = { 1, 2, 3, 4 };

because simplify_permutation folds
VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} >
into:
vector_cst {1, 2, 3, 4}

and it complains during verify_gimple_assign_single because we don't
support assignment of vector_cst to VLA vector.

I guess the issue really is that currently, only VEC_PERM_EXPR
supports lhs and rhs
to have vector types with differing lengths, and simplifying it to
other tree codes, like above,
will result in type errors ?

Thanks,
Prathamesh
>
> >
> > Alternatively, should we allow assignments from fixed-width to SVE
> > vector, so the above
> > VIEW_CONVERT_EXPR would result in dup ?
> >
> > Thanks,
> > Prathamesh


Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-07-12 Thread Richard Biener via Gcc-patches
On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
 wrote:
>
> Hi Richard,
> For the following test:
>
> svint32_t f2(int a, int b, int c, int d)
> {
>   int32x4_t v = (int32x4_t) {a, b, c, d};
>   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> }
>
> The compiler emits following ICE with -O3 -mcpu=generic+sve:
> foo.c: In function ‘f2’:
> foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
> 4 | svint32_t f2(int a, int b, int c, int d)
>   |   ^~
> svint32_t
> __Int32x4_t
> _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> during GIMPLE pass: forwprop
> dump file: foo.c.109t.forwprop2
> foo.c:4:11: internal compiler error: verify_gimple failed
> 0xfda04a verify_gimple_in_cfg(function*, bool)
> ../../gcc/gcc/tree-cfg.cc:5568
> 0xe9371f execute_function_todo
> ../../gcc/gcc/passes.cc:2091
> 0xe93ccb execute_todo
> ../../gcc/gcc/passes.cc:2145
>
> This happens because, after folding svld1rq_s32 to vec_perm_expr, we have:
>   int32x4_t v;
>   __Int32x4_t _1;
>   svint32_t _9;
>   vector(4) int _11;
>
>:
>   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
>   v_12 = _1;
>   _11 = v_12;
>   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
>   return _9;
>
> During forwprop, simplify_permutation simplifies vec_perm_expr to
> view_convert_expr,
> and the end result becomes:
>   svint32_t _7;
>   __Int32x4_t _8;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
>   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
>   return _7;
> ;;succ:   EXIT
>
> which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
> has incompatible types (svint32_t, int32x4_t).
>
> The attached patch disables simplification of VEC_PERM_EXPR
> in simplify_permutation, if lhs and rhs have non compatible types,
> which resolves ICE, but am not sure if it's the correct approach ?

It for sure papers over the issue.  I think the error happens earlier,
the V_C_E should have been built with the type of the VEC_PERM_EXPR
which is the type of the LHS.  But then you probably run into the
different sizes ICE (VLA vs constant size).  I think for this case you
want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
selecting the "low" part of the VLA vector.

>
> Alternatively, should we allow assignments from fixed-width to SVE
> vector, so the above
> VIEW_CONVERT_EXPR would result in dup ?
>
> Thanks,
> Prathamesh


ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-07-12 Thread Prathamesh Kulkarni via Gcc-patches
Hi Richard,
For the following test:

svint32_t f2(int a, int b, int c, int d)
{
  int32x4_t v = (int32x4_t) {a, b, c, d};
  return svld1rq_s32 (svptrue_b8 (), &v[0]);
}

The compiler emits following ICE with -O3 -mcpu=generic+sve:
foo.c: In function ‘f2’:
foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
4 | svint32_t f2(int a, int b, int c, int d)
  |   ^~
svint32_t
__Int32x4_t
_7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
during GIMPLE pass: forwprop
dump file: foo.c.109t.forwprop2
foo.c:4:11: internal compiler error: verify_gimple failed
0xfda04a verify_gimple_in_cfg(function*, bool)
../../gcc/gcc/tree-cfg.cc:5568
0xe9371f execute_function_todo
../../gcc/gcc/passes.cc:2091
0xe93ccb execute_todo
../../gcc/gcc/passes.cc:2145

This happens because, after folding svld1rq_s32 to vec_perm_expr, we have:
  int32x4_t v;
  __Int32x4_t _1;
  svint32_t _9;
  vector(4) int _11;

   :
  _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
  v_12 = _1;
  _11 = v_12;
  _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
  return _9;

During forwprop, simplify_permutation simplifies vec_perm_expr to
view_convert_expr,
and the end result becomes:
  svint32_t _7;
  __Int32x4_t _8;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
  _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
  return _7;
;;succ:   EXIT

which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
has incompatible types (svint32_t, int32x4_t).

The attached patch disables simplification of VEC_PERM_EXPR
in simplify_permutation, if lhs and rhs have non compatible types,
which resolves ICE, but am not sure if it's the correct approach ?

Alternatively, should we allow assignments from fixed-width to SVE
vector, so the above
VIEW_CONVERT_EXPR would result in dup ?

Thanks,
Prathamesh
diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index 69567ab3275..be888f1c48e 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -2414,6 +2414,9 @@ simplify_permutation (gimple_stmt_iterator *gsi)
   if (TREE_CODE (op2) != VECTOR_CST)
 return 0;
 
+  if (!types_compatible_p (TREE_TYPE (gimple_get_lhs (stmt)), TREE_TYPE (op0)))
+return 0;
+
   if (TREE_CODE (op0) == VECTOR_CST)
 {
   code = VECTOR_CST;