Re: [AArch64] Enable generation of FRINTNZ instructions

2022-11-16 Thread Richard Biener via Gcc-patches
On Tue, 15 Nov 2022, Richard Sandiford wrote:

> "Andre Vieira (lists)"  writes:
> > On 07/11/2022 11:05, Richard Biener wrote:
> >> On Fri, 4 Nov 2022, Andre Vieira (lists) wrote:
> >>
> >>> Sorry for the delay, just been reminded I still had this patch outstanding
> >>> from last stage 1. Hopefully since it has been mostly reviewed it could 
> >>> go in
> >>> for this stage 1?
> >>>
> >>> I addressed the comments and gave the slp-part of vectorizable_call some 
> >>> TLC
> >>> to make it work.
> >>>
> >>> I also changed vect_get_slp_defs as I noticed that the call from
> >>> vectorizable_call was creating an auto_vec with 'nargs' that might be less
> >>> than the number of children in the slp_node
> >> how so?  Please fix that in the caller.  It looks like it probably
> >> shoud use vect_nargs instead?
> > Well that was my first intuition, but when I looked at it further the 
> > variant it's calling:
> > void vect_get_slp_defs (vec_info *, slp_tree slp_node, vec > 
> > *vec_oprnds, unsigned n)
> >
> > Is actually creating a vector of vectors of slp defs. So for each child 
> > of slp_node it calls:
> > void vect_get_slp_defs (slp_tree slp_node, vec *vec_defs)
> >
> > Which returns a vector of vectorized defs. So vect_nargs would be the 
> > right size for the inner vec of vec_defs, but the outer should 
> > have the same number of elements as the original slp_node has children.
> >
> > However, at the call site (vectorizable_call), the operand we pass to 
> > vect_get_slp_defs 'vec_defs', is initialized before the code-path is 
> > specialized for slp_node. I'll go see if I can change the call site to 
> > not have to do that, given the continue at the end of the if (slp_node) 
> > BB I don't think it needs to use vec_defs after it, but it may require 
> > some massaging to be able to define it separately for each code-path.
> >
> >>
> >>> , so that quick_push might not be
> >>> safe as is, so I added the reserve (n) to ensure it's safe to push. I 
> >>> didn't
> >>> actually come across any failure because of it though. Happy to split this
> >>> into a separate patch if needed.
> >>>
> >>> Bootstrapped and regression tested on aarch64-none-linux-gnu and
> >>> x86_64-pc-linux-gnu.
> >>>
> >>> OK for trunk?
> >> I'll leave final approval to Richard but
> >>
> >> - This only needs 1 bit, but occupies the full 16 to ensure a nice
> >> + This only needs 1 bit, but occupies the full 15 to ensure a nice
> >>layout.  */
> >> unsigned int vectorizable : 16;
> >>
> >> you don't actually change the width of the bitfield.  I would find
> >> it more natural to have
> >>
> >>signed int type0 : 7;
> >>signed int type0_vtrans : 1;
> >>signed int type1 : 7;
> >>signed int type1_vtrans : 1;
> >>
> >> with typeN_vtrans specifying how the types transform when vectorized.
> >> I would imagine another variant we could need is narrow/widen
> >> according to either result or other argument type?  That said,
> >> just your flag would then be
> >>
> >>signed int type0 : 7;
> >>signed int pad   : 1;
> >>signed int type1 : 7;
> >>signed int type1_vect_as_scalar : 1;
> >>
> >> ?
> > That's a cool idea! I'll leave it as a single bit for now like that, if 
> > we want to re-use it for multiple transformations we will obviously need 
> > to rename & give it more bits.
> 
> I think we should steal bits from vectorizable rather than shrink
> type0 and type1 though.  Then add a 14-bit padding field to show
> how many bits are left.
> 
> > @@ -3340,9 +3364,20 @@ vectorizable_call (vec_info *vinfo,
> >rhs_type = unsigned_type_node;
> >  }
> > 
> > +  /* The argument that is not of the same type as the others.  */
> >int mask_opno = -1;
> > +  int scalar_opno = -1;
> >if (internal_fn_p (cfn))
> > -mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
> > +{
> > +  internal_fn ifn = as_internal_fn (cfn);
> > +  if (direct_internal_fn_p (ifn)
> > + && direct_internal_fn (ifn).type1_is_scalar_p)
> > +   scalar_opno = direct_internal_fn (ifn).type1;
> > +  else
> > +   /* For masked operations this represents the argument that carries the
> > +  mask.  */
> > +   mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
> 
> This doesn't seem logically like an else.  We should do both.
> 
> LGTM otherwise for the bits outside match.pd.  If Richard's happy with
> the match.pd bits then I think the patch is OK with those changes and
> without the vect_get_slp_defs thing (as you mentioned downthread).

Yes, the match.pd part looked OK.

> Thanks,
> Richard
> 
> 
> >>
> >>> gcc/ChangeLog:
> >>>
> >>>      * config/aarch64/aarch64.md (ftrunc2): New
> >>> pattern.
> >>>      * config/aarch64/iterators.md (FRINTNZ): New iterator.
> >>>      (frintnz_mode): New int attribute.
> >>>      (VSFDF): Make iterator conditional.
> >>>      * internal-fn.def (FTRUNC_INT): New IFN.
> >>>      * internal-fn.cc (ftrunc_int_direct): Ne

Re: [AArch64] Enable generation of FRINTNZ instructions

2022-11-15 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)"  writes:
> On 07/11/2022 11:05, Richard Biener wrote:
>> On Fri, 4 Nov 2022, Andre Vieira (lists) wrote:
>>
>>> Sorry for the delay, just been reminded I still had this patch outstanding
>>> from last stage 1. Hopefully since it has been mostly reviewed it could go 
>>> in
>>> for this stage 1?
>>>
>>> I addressed the comments and gave the slp-part of vectorizable_call some TLC
>>> to make it work.
>>>
>>> I also changed vect_get_slp_defs as I noticed that the call from
>>> vectorizable_call was creating an auto_vec with 'nargs' that might be less
>>> than the number of children in the slp_node
>> how so?  Please fix that in the caller.  It looks like it probably
>> shoud use vect_nargs instead?
> Well that was my first intuition, but when I looked at it further the 
> variant it's calling:
> void vect_get_slp_defs (vec_info *, slp_tree slp_node, vec > 
> *vec_oprnds, unsigned n)
>
> Is actually creating a vector of vectors of slp defs. So for each child 
> of slp_node it calls:
> void vect_get_slp_defs (slp_tree slp_node, vec *vec_defs)
>
> Which returns a vector of vectorized defs. So vect_nargs would be the 
> right size for the inner vec of vec_defs, but the outer should 
> have the same number of elements as the original slp_node has children.
>
> However, at the call site (vectorizable_call), the operand we pass to 
> vect_get_slp_defs 'vec_defs', is initialized before the code-path is 
> specialized for slp_node. I'll go see if I can change the call site to 
> not have to do that, given the continue at the end of the if (slp_node) 
> BB I don't think it needs to use vec_defs after it, but it may require 
> some massaging to be able to define it separately for each code-path.
>
>>
>>> , so that quick_push might not be
>>> safe as is, so I added the reserve (n) to ensure it's safe to push. I didn't
>>> actually come across any failure because of it though. Happy to split this
>>> into a separate patch if needed.
>>>
>>> Bootstrapped and regression tested on aarch64-none-linux-gnu and
>>> x86_64-pc-linux-gnu.
>>>
>>> OK for trunk?
>> I'll leave final approval to Richard but
>>
>> - This only needs 1 bit, but occupies the full 16 to ensure a nice
>> + This only needs 1 bit, but occupies the full 15 to ensure a nice
>>layout.  */
>> unsigned int vectorizable : 16;
>>
>> you don't actually change the width of the bitfield.  I would find
>> it more natural to have
>>
>>signed int type0 : 7;
>>signed int type0_vtrans : 1;
>>signed int type1 : 7;
>>signed int type1_vtrans : 1;
>>
>> with typeN_vtrans specifying how the types transform when vectorized.
>> I would imagine another variant we could need is narrow/widen
>> according to either result or other argument type?  That said,
>> just your flag would then be
>>
>>signed int type0 : 7;
>>signed int pad   : 1;
>>signed int type1 : 7;
>>signed int type1_vect_as_scalar : 1;
>>
>> ?
> That's a cool idea! I'll leave it as a single bit for now like that, if 
> we want to re-use it for multiple transformations we will obviously need 
> to rename & give it more bits.

I think we should steal bits from vectorizable rather than shrink
type0 and type1 though.  Then add a 14-bit padding field to show
how many bits are left.

> @@ -3340,9 +3364,20 @@ vectorizable_call (vec_info *vinfo,
>rhs_type = unsigned_type_node;
>  }
> 
> +  /* The argument that is not of the same type as the others.  */
>int mask_opno = -1;
> +  int scalar_opno = -1;
>if (internal_fn_p (cfn))
> -mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
> +{
> +  internal_fn ifn = as_internal_fn (cfn);
> +  if (direct_internal_fn_p (ifn)
> +   && direct_internal_fn (ifn).type1_is_scalar_p)
> + scalar_opno = direct_internal_fn (ifn).type1;
> +  else
> + /* For masked operations this represents the argument that carries the
> +mask.  */
> + mask_opno = internal_fn_mask_index (as_internal_fn (cfn));

This doesn't seem logically like an else.  We should do both.

LGTM otherwise for the bits outside match.pd.  If Richard's happy with
the match.pd bits then I think the patch is OK with those changes and
without the vect_get_slp_defs thing (as you mentioned downthread).

Thanks,
Richard


>>
>>> gcc/ChangeLog:
>>>
>>>      * config/aarch64/aarch64.md (ftrunc2): New
>>> pattern.
>>>      * config/aarch64/iterators.md (FRINTNZ): New iterator.
>>>      (frintnz_mode): New int attribute.
>>>      (VSFDF): Make iterator conditional.
>>>      * internal-fn.def (FTRUNC_INT): New IFN.
>>>      * internal-fn.cc (ftrunc_int_direct): New define.
>>>      (expand_ftrunc_int_optab_fn): New custom expander.
>>>      (direct_ftrunc_int_optab_supported_p): New supported_p.
>>>      * internal-fn.h (direct_internal_fn_info): Add new member
>>>      type1_is_scalar_p.
>>>      * match.pd: Add to the existing TRUNC pattern match.
>>>     

Re: [AArch64] Enable generation of FRINTNZ instructions

2022-11-09 Thread Andre Vieira (lists) via Gcc-patches



On 07/11/2022 14:56, Richard Biener wrote:

On Mon, 7 Nov 2022, Andre Vieira (lists) wrote:


On 07/11/2022 11:05, Richard Biener wrote:

On Fri, 4 Nov 2022, Andre Vieira (lists) wrote:


Sorry for the delay, just been reminded I still had this patch outstanding
from last stage 1. Hopefully since it has been mostly reviewed it could go
in
for this stage 1?

I addressed the comments and gave the slp-part of vectorizable_call some
TLC
to make it work.

I also changed vect_get_slp_defs as I noticed that the call from
vectorizable_call was creating an auto_vec with 'nargs' that might be less
than the number of children in the slp_node

how so?  Please fix that in the caller.  It looks like it probably
shoud use vect_nargs instead?

Well that was my first intuition, but when I looked at it further the variant
it's calling:
void vect_get_slp_defs (vec_info *, slp_tree slp_node, vec >
*vec_oprnds, unsigned n)

Is actually creating a vector of vectors of slp defs. So for each child of
slp_node it calls:
void vect_get_slp_defs (slp_tree slp_node, vec *vec_defs)

Which returns a vector of vectorized defs. So vect_nargs would be the right
size for the inner vec of vec_defs, but the outer should have the same
number of elements as the original slp_node has children.

No, the inner vector is the vector of vectors for each arg, the outer
vector should be the one for each argument.  Hm, that was a confusing
sentence.

That said, the number of SLP children of a call node should eventually
be the number of arguments of the call (plus masks, etc.).  So it
looks about correct besides the vec_nargs issue?
Yeah you are right, I misunderstood what the children were, so you have 
a child node for each argument of the call. Though, since you iterate 
over the 'scalar' arguments of the call I actually think 'nargs' was 
correct to begin with, which would explain why this never went wrong... 
So I think it is actually correct as is, I must have gotten confused by 
some earlier investigation into how to deal with the scalar arguments... 
Sorry for the noise, I'll undo these changes to the patch.


Re: [AArch64] Enable generation of FRINTNZ instructions

2022-11-07 Thread Richard Biener via Gcc-patches
On Mon, 7 Nov 2022, Andre Vieira (lists) wrote:

> 
> On 07/11/2022 11:05, Richard Biener wrote:
> > On Fri, 4 Nov 2022, Andre Vieira (lists) wrote:
> >
> >> Sorry for the delay, just been reminded I still had this patch outstanding
> >> from last stage 1. Hopefully since it has been mostly reviewed it could go
> >> in
> >> for this stage 1?
> >>
> >> I addressed the comments and gave the slp-part of vectorizable_call some
> >> TLC
> >> to make it work.
> >>
> >> I also changed vect_get_slp_defs as I noticed that the call from
> >> vectorizable_call was creating an auto_vec with 'nargs' that might be less
> >> than the number of children in the slp_node
> > how so?  Please fix that in the caller.  It looks like it probably
> > shoud use vect_nargs instead?
> Well that was my first intuition, but when I looked at it further the variant
> it's calling:
> void vect_get_slp_defs (vec_info *, slp_tree slp_node, vec >
> *vec_oprnds, unsigned n)
> 
> Is actually creating a vector of vectors of slp defs. So for each child of
> slp_node it calls:
> void vect_get_slp_defs (slp_tree slp_node, vec *vec_defs)
> 
> Which returns a vector of vectorized defs. So vect_nargs would be the right
> size for the inner vec of vec_defs, but the outer should have the same
> number of elements as the original slp_node has children.

No, the inner vector is the vector of vectors for each arg, the outer
vector should be the one for each argument.  Hm, that was a confusing
sentence.

That said, the number of SLP children of a call node should eventually
be the number of arguments of the call (plus masks, etc.).  So it
looks about correct besides the vec_nargs issue?

> 
> However, at the call site (vectorizable_call), the operand we pass to
> vect_get_slp_defs 'vec_defs', is initialized before the code-path is
> specialized for slp_node. I'll go see if I can change the call site to not
> have to do that, given the continue at the end of the if (slp_node) BB I don't
> think it needs to use vec_defs after it, but it may require some massaging to
> be able to define it separately for each code-path.
> 
> >
> >> , so that quick_push might not be
> >> safe as is, so I added the reserve (n) to ensure it's safe to push. I
> >> didn't
> >> actually come across any failure because of it though. Happy to split this
> >> into a separate patch if needed.
> >>
> >> Bootstrapped and regression tested on aarch64-none-linux-gnu and
> >> x86_64-pc-linux-gnu.
> >>
> >> OK for trunk?
> > I'll leave final approval to Richard but
> >
> > - This only needs 1 bit, but occupies the full 16 to ensure a nice
> > + This only needs 1 bit, but occupies the full 15 to ensure a nice
> >layout.  */
> > unsigned int vectorizable : 16;
> >
> > you don't actually change the width of the bitfield.  I would find
> > it more natural to have
> >
> >signed int type0 : 7;
> >signed int type0_vtrans : 1;
> >signed int type1 : 7;
> >signed int type1_vtrans : 1;
> >
> > with typeN_vtrans specifying how the types transform when vectorized.
> > I would imagine another variant we could need is narrow/widen
> > according to either result or other argument type?  That said,
> > just your flag would then be
> >
> >signed int type0 : 7;
> >signed int pad   : 1;
> >signed int type1 : 7;
> >signed int type1_vect_as_scalar : 1;
> >
> > ?
> That's a cool idea! I'll leave it as a single bit for now like that, if we
> want to re-use it for multiple transformations we will obviously need to
> rename & give it more bits.
> >
> >> gcc/ChangeLog:
> >>
> >>     * config/aarch64/aarch64.md (ftrunc2): New
> >> pattern.
> >>      * config/aarch64/iterators.md (FRINTNZ): New iterator.
> >>      (frintnz_mode): New int attribute.
> >>      (VSFDF): Make iterator conditional.
> >>      * internal-fn.def (FTRUNC_INT): New IFN.
> >>      * internal-fn.cc (ftrunc_int_direct): New define.
> >>      (expand_ftrunc_int_optab_fn): New custom expander.
> >>      (direct_ftrunc_int_optab_supported_p): New supported_p.
> >>      * internal-fn.h (direct_internal_fn_info): Add new member
> >>      type1_is_scalar_p.
> >>      * match.pd: Add to the existing TRUNC pattern match.
> >>      * optabs.def (ftrunc_int): New entry.
> >>      * stor-layout.h (element_precision): Moved from here...
> >>      * tree.h (element_precision): ... to here.
> >>      (element_type): New declaration.
> >>      * tree.cc (element_type): New function.
> >>      (element_precision): Changed to use element_type.
> >>      * tree-vect-stmts.cc (vectorizable_internal_function): Add
> >> support for
> >>      IFNs with different input types.
> >>      (vect_get_scalar_oprnds): New function.
> >>      (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
> >>      * tree-vect-slp.cc (check_scalar_arg_ok): New function.
> >>      (vect_slp_analyze_node_operations): Use check_scalar_arg_ok.
> >>    

Re: [AArch64] Enable generation of FRINTNZ instructions

2022-11-07 Thread Andre Vieira (lists) via Gcc-patches



On 07/11/2022 11:05, Richard Biener wrote:

On Fri, 4 Nov 2022, Andre Vieira (lists) wrote:


Sorry for the delay, just been reminded I still had this patch outstanding
from last stage 1. Hopefully since it has been mostly reviewed it could go in
for this stage 1?

I addressed the comments and gave the slp-part of vectorizable_call some TLC
to make it work.

I also changed vect_get_slp_defs as I noticed that the call from
vectorizable_call was creating an auto_vec with 'nargs' that might be less
than the number of children in the slp_node

how so?  Please fix that in the caller.  It looks like it probably
shoud use vect_nargs instead?
Well that was my first intuition, but when I looked at it further the 
variant it's calling:
void vect_get_slp_defs (vec_info *, slp_tree slp_node, vec > 
*vec_oprnds, unsigned n)


Is actually creating a vector of vectors of slp defs. So for each child 
of slp_node it calls:

void vect_get_slp_defs (slp_tree slp_node, vec *vec_defs)

Which returns a vector of vectorized defs. So vect_nargs would be the 
right size for the inner vec of vec_defs, but the outer should 
have the same number of elements as the original slp_node has children.


However, at the call site (vectorizable_call), the operand we pass to 
vect_get_slp_defs 'vec_defs', is initialized before the code-path is 
specialized for slp_node. I'll go see if I can change the call site to 
not have to do that, given the continue at the end of the if (slp_node) 
BB I don't think it needs to use vec_defs after it, but it may require 
some massaging to be able to define it separately for each code-path.





, so that quick_push might not be
safe as is, so I added the reserve (n) to ensure it's safe to push. I didn't
actually come across any failure because of it though. Happy to split this
into a separate patch if needed.

Bootstrapped and regression tested on aarch64-none-linux-gnu and
x86_64-pc-linux-gnu.

OK for trunk?

I'll leave final approval to Richard but

- This only needs 1 bit, but occupies the full 16 to ensure a nice
+ This only needs 1 bit, but occupies the full 15 to ensure a nice
   layout.  */
unsigned int vectorizable : 16;

you don't actually change the width of the bitfield.  I would find
it more natural to have

   signed int type0 : 7;
   signed int type0_vtrans : 1;
   signed int type1 : 7;
   signed int type1_vtrans : 1;

with typeN_vtrans specifying how the types transform when vectorized.
I would imagine another variant we could need is narrow/widen
according to either result or other argument type?  That said,
just your flag would then be

   signed int type0 : 7;
   signed int pad   : 1;
   signed int type1 : 7;
   signed int type1_vect_as_scalar : 1;

?
That's a cool idea! I'll leave it as a single bit for now like that, if 
we want to re-use it for multiple transformations we will obviously need 
to rename & give it more bits.



gcc/ChangeLog:

     * config/aarch64/aarch64.md (ftrunc2): New
pattern.
     * config/aarch64/iterators.md (FRINTNZ): New iterator.
     (frintnz_mode): New int attribute.
     (VSFDF): Make iterator conditional.
     * internal-fn.def (FTRUNC_INT): New IFN.
     * internal-fn.cc (ftrunc_int_direct): New define.
     (expand_ftrunc_int_optab_fn): New custom expander.
     (direct_ftrunc_int_optab_supported_p): New supported_p.
     * internal-fn.h (direct_internal_fn_info): Add new member
     type1_is_scalar_p.
     * match.pd: Add to the existing TRUNC pattern match.
     * optabs.def (ftrunc_int): New entry.
     * stor-layout.h (element_precision): Moved from here...
     * tree.h (element_precision): ... to here.
     (element_type): New declaration.
     * tree.cc (element_type): New function.
     (element_precision): Changed to use element_type.
     * tree-vect-stmts.cc (vectorizable_internal_function): Add
support for
     IFNs with different input types.
     (vect_get_scalar_oprnds): New function.
     (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
     * tree-vect-slp.cc (check_scalar_arg_ok): New function.
     (vect_slp_analyze_node_operations): Use check_scalar_arg_ok.
     (vect_get_slp_defs): Ensure vec_oprnds has enough slots to push.
     * doc/md.texi: New entry for ftrunc pattern name.
     * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.

gcc/testsuite/ChangeLog:

     * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintnz
instructions available.
     * lib/target-supports.exp: Added aarch64_frintnzx_ok target and
aarch64_frintz options.
     * gcc.target/aarch64/frintnz.c: New test.
     * gcc.target/aarch64/frintnz_vec.c: New test.
     * gcc.target/aarch64/frintnz_slp.c: New test.



Re: [AArch64] Enable generation of FRINTNZ instructions

2022-11-07 Thread Richard Biener via Gcc-patches
On Fri, 4 Nov 2022, Andre Vieira (lists) wrote:

> Sorry for the delay, just been reminded I still had this patch outstanding
> from last stage 1. Hopefully since it has been mostly reviewed it could go in
> for this stage 1?
> 
> I addressed the comments and gave the slp-part of vectorizable_call some TLC
> to make it work.
> 
> I also changed vect_get_slp_defs as I noticed that the call from
> vectorizable_call was creating an auto_vec with 'nargs' that might be less
> than the number of children in the slp_node

how so?  Please fix that in the caller.  It looks like it probably
shoud use vect_nargs instead?

> , so that quick_push might not be
> safe as is, so I added the reserve (n) to ensure it's safe to push. I didn't
> actually come across any failure because of it though. Happy to split this
> into a separate patch if needed.
> 
> Bootstrapped and regression tested on aarch64-none-linux-gnu and
> x86_64-pc-linux-gnu.
> 
> OK for trunk?

I'll leave final approval to Richard but

- This only needs 1 bit, but occupies the full 16 to ensure a nice
+ This only needs 1 bit, but occupies the full 15 to ensure a nice
  layout.  */
   unsigned int vectorizable : 16;

you don't actually change the width of the bitfield.  I would find
it more natural to have

  signed int type0 : 7;
  signed int type0_vtrans : 1;
  signed int type1 : 7;
  signed int type1_vtrans : 1;

with typeN_vtrans specifying how the types transform when vectorized.
I would imagine another variant we could need is narrow/widen
according to either result or other argument type?  That said,
just your flag would then be

  signed int type0 : 7;
  signed int pad   : 1;
  signed int type1 : 7;
  signed int type1_vect_as_scalar : 1; 

?

> gcc/ChangeLog:
> 
>     * config/aarch64/aarch64.md (ftrunc2): New
> pattern.
>     * config/aarch64/iterators.md (FRINTNZ): New iterator.
>     (frintnz_mode): New int attribute.
>     (VSFDF): Make iterator conditional.
>     * internal-fn.def (FTRUNC_INT): New IFN.
>     * internal-fn.cc (ftrunc_int_direct): New define.
>     (expand_ftrunc_int_optab_fn): New custom expander.
>     (direct_ftrunc_int_optab_supported_p): New supported_p.
>     * internal-fn.h (direct_internal_fn_info): Add new member
>     type1_is_scalar_p.
>     * match.pd: Add to the existing TRUNC pattern match.
>     * optabs.def (ftrunc_int): New entry.
>     * stor-layout.h (element_precision): Moved from here...
>     * tree.h (element_precision): ... to here.
>     (element_type): New declaration.
>     * tree.cc (element_type): New function.
>     (element_precision): Changed to use element_type.
>     * tree-vect-stmts.cc (vectorizable_internal_function): Add 
> support for
>     IFNs with different input types.
>     (vect_get_scalar_oprnds): New function.
>     (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
>     * tree-vect-slp.cc (check_scalar_arg_ok): New function.
>     (vect_slp_analyze_node_operations): Use check_scalar_arg_ok.
>     (vect_get_slp_defs): Ensure vec_oprnds has enough slots to push.
>     * doc/md.texi: New entry for ftrunc pattern name.
>     * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.
> 
> gcc/testsuite/ChangeLog:
> 
>     * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintnz
> instructions available.
>     * lib/target-supports.exp: Added aarch64_frintnzx_ok target and
> aarch64_frintz options.
>     * gcc.target/aarch64/frintnz.c: New test.
>     * gcc.target/aarch64/frintnz_vec.c: New test.
>     * gcc.target/aarch64/frintnz_slp.c: New test.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [AArch64] Enable generation of FRINTNZ instructions

2022-11-04 Thread Andre Vieira (lists) via Gcc-patches
Sorry for the delay, just been reminded I still had this patch 
outstanding from last stage 1. Hopefully since it has been mostly 
reviewed it could go in for this stage 1?


I addressed the comments and gave the slp-part of vectorizable_call some 
TLC to make it work.


I also changed vect_get_slp_defs as I noticed that the call from 
vectorizable_call was creating an auto_vec with 'nargs' that might be 
less than the number of children in the slp_node, so that quick_push 
might not be safe as is, so I added the reserve (n) to ensure it's safe 
to push. I didn't actually come across any failure because of it though. 
Happy to split this into a separate patch if needed.


Bootstrapped and regression tested on aarch64-none-linux-gnu and 
x86_64-pc-linux-gnu.


OK for trunk?

gcc/ChangeLog:

    * config/aarch64/aarch64.md (ftrunc2): New 
pattern.

    * config/aarch64/iterators.md (FRINTNZ): New iterator.
    (frintnz_mode): New int attribute.
    (VSFDF): Make iterator conditional.
    * internal-fn.def (FTRUNC_INT): New IFN.
    * internal-fn.cc (ftrunc_int_direct): New define.
    (expand_ftrunc_int_optab_fn): New custom expander.
    (direct_ftrunc_int_optab_supported_p): New supported_p.
    * internal-fn.h (direct_internal_fn_info): Add new member
    type1_is_scalar_p.
    * match.pd: Add to the existing TRUNC pattern match.
    * optabs.def (ftrunc_int): New entry.
    * stor-layout.h (element_precision): Moved from here...
    * tree.h (element_precision): ... to here.
    (element_type): New declaration.
    * tree.cc (element_type): New function.
    (element_precision): Changed to use element_type.
    * tree-vect-stmts.cc (vectorizable_internal_function): Add 
support for

    IFNs with different input types.
    (vect_get_scalar_oprnds): New function.
    (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
    * tree-vect-slp.cc (check_scalar_arg_ok): New function.
    (vect_slp_analyze_node_operations): Use check_scalar_arg_ok.
    (vect_get_slp_defs): Ensure vec_oprnds has enough slots to push.
    * doc/md.texi: New entry for ftrunc pattern name.
    * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.

gcc/testsuite/ChangeLog:

    * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintnz 
instructions available.
    * lib/target-supports.exp: Added aarch64_frintnzx_ok target and 
aarch64_frintz options.

    * gcc.target/aarch64/frintnz.c: New test.
    * gcc.target/aarch64/frintnz_vec.c: New test.
    * gcc.target/aarch64/frintnz_slp.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
f2e3d905dbbeb2949f2947f5cfd68208c94c9272..47368e09b106e5b43640bd4f113abd0b9a15b9c8
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7564,12 +7564,18 @@
(set_attr "speculation_barrier" "true")]
 )
 
+(define_expand "ftrunc2"
+  [(set (match_operand:VSFDF 0 "register_operand" "=w")
+(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
+ FRINTNZ))]
+  "TARGET_FRINT"
+)
+
 (define_insn "aarch64_"
   [(set (match_operand:VSFDF 0 "register_operand" "=w")
(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
  FRINTNZX))]
-  "TARGET_FRINT && TARGET_FLOAT
-   && !(VECTOR_MODE_P (mode) && !TARGET_SIMD)"
+  "TARGET_FRINT"
   "\\t%0, %1"
   [(set_attr "type" "f_rint")]
 )
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 
a8ad4e5ff215ade06c3ca13a24ef18d259afcb6c..b1f78d87fbe6118e792b00580c6beb23ce63e27c
 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -173,7 +173,11 @@
  SF DF])
 
 ;; Scalar and vetor modes for SF, DF.
-(define_mode_iterator VSFDF [V2SF V4SF V2DF DF SF])
+(define_mode_iterator VSFDF [(V2SF "TARGET_SIMD")
+(V4SF "TARGET_SIMD")
+(V2DF "TARGET_SIMD")
+(DF "TARGET_FLOAT")
+(SF "TARGET_FLOAT")])
 
 ;; Advanced SIMD single Float modes.
 (define_mode_iterator VDQSF [V2SF V4SF])
@@ -3136,6 +3140,8 @@
 (define_int_iterator FRINTNZX [UNSPEC_FRINT32Z UNSPEC_FRINT32X
   UNSPEC_FRINT64Z UNSPEC_FRINT64X])
 
+(define_int_iterator FRINTNZ [UNSPEC_FRINT32Z UNSPEC_FRINT64Z])
+
 (define_int_iterator SVE_BRK_UNARY [UNSPEC_BRKA UNSPEC_BRKB])
 
 (define_int_iterator SVE_BRKP [UNSPEC_BRKPA UNSPEC_BRKPB])
@@ -3545,6 +3551,8 @@
 (define_int_attr frintnzs_op [(UNSPEC_FRINT32Z "frint32z") (UNSPEC_FRINT32X 
"frint32x")
  (UNSPEC_FRINT64Z "frint64z") (UNSPEC_FRINT64X 
"frint64x")])
 
+(define_int_attr frintnz_mode [(UNSPEC_FRINT32Z "si") (UNSPEC_FRINT64Z "di")])
+
 ;; The condition associated with an UNSPEC_COND_.
 (define_int_attr cmp_op [(UNSPEC_COND_CMPEQ_WIDE "eq")
   

Re: [AArch64] Enable generation of FRINTNZ instructions

2022-01-14 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)"  writes:
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 
> 19e89ae502bc2f51db64667b236c1cb669718b02..3b0e4e0875b4392ab6833568b207580ef597a98f
>  100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6191,6 +6191,15 @@ operands; otherwise, it may not.
>  
>  This pattern is not allowed to @code{FAIL}.
>  
> +@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern
> +@item @samp{ftrunc@var{m}@var{n}2}
> +Truncate operand 1 to a @var{n} mode signed integer, towards zero, and store
> +the result in operand 0. Both operands have mode @var{m}, which is a scalar 
> or
> +vector floating-point mode.  An exception must be raised if operand 1 does 
> not
> +fit in a @var{n} mode signed integer as it would have if the truncation
> +happened through separate floating point to integer conversion.
> +
> +

Nit: just one blank line here.

>  @cindex @code{round@var{m}2} instruction pattern
>  @item @samp{round@var{m}2}
>  Round operand 1 to the nearest integer, rounding away from zero in the
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index 
> 6095a35cd4565fdb7d758104e80fe6411230f758..a56bbb775572fa72379854f90a01ad543557e29a
>  100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -2286,6 +2286,10 @@ Like @code{aarch64_sve_hw}, but also test for an exact 
> hardware vector length.
>  @item aarch64_fjcvtzs_hw
>  AArch64 target that is able to generate and execute armv8.3-a FJCVTZS
>  instruction.
> +
> +@item aarch64_frintzx_ok
> +AArch64 target that is able to generate the Armv8.5-a FRINT32Z, FRINT64Z,
> +FRINT32X and FRINT64X instructions.
>  @end table
>  
>  @subsubsection MIPS-specific attributes
> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> index 
> b24102a5990bea4cbb102069f7a6df497fc81ebf..9047b486f41948059a7a7f1ccc4032410a369139
>  100644
> --- a/gcc/internal-fn.c
> +++ b/gcc/internal-fn.c
> @@ -130,6 +130,7 @@ init_internal_fns ()
>  #define fold_left_direct { 1, 1, false }
>  #define mask_fold_left_direct { 1, 1, false }
>  #define check_ptrs_direct { 0, 0, false }
> +#define ftrunc_int_direct { 0, 1, true }
>  
>  const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = {
>  #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) not_direct,
> @@ -156,6 +157,29 @@ get_multi_vector_move (tree array_type, convert_optab 
> optab)
>return convert_optab_handler (optab, imode, vmode);
>  }
>  
> +/* Expand FTRUNC_INT call STMT using optab OPTAB.  */
> +
> +static void
> +expand_ftrunc_int_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> +{
> +  class expand_operand ops[2];
> +  tree lhs, float_type, int_type;
> +  rtx target, op;
> +
> +  lhs = gimple_call_lhs (stmt);
> +  target = expand_normal (lhs);
> +  op = expand_normal (gimple_call_arg (stmt, 0));
> +
> +  float_type = TREE_TYPE (lhs);
> +  int_type = element_type (gimple_call_arg (stmt, 1));

Sorry for the run-around, but now that we don't (need to) vectorise
the second argument, I think we can drop this element_type.  That in
turn means that…

> +
> +  create_output_operand (&ops[0], target, TYPE_MODE (float_type));
> +  create_input_operand (&ops[1], op, TYPE_MODE (float_type));
> +
> +  expand_insn (convert_optab_handler (optab, TYPE_MODE (float_type),
> +   TYPE_MODE (int_type)), 2, ops);
> +}
> +
>  /* Expand LOAD_LANES call STMT using optab OPTAB.  */
>  
>  static void
> @@ -3747,6 +3771,15 @@ multi_vector_optab_supported_p (convert_optab optab, 
> tree_pair types,
> != CODE_FOR_nothing);
>  }
>  
> +static bool
> +direct_ftrunc_int_optab_supported_p (convert_optab optab, tree_pair types,
> +  optimization_type opt_type)
> +{
> +  return (convert_optab_handler (optab, TYPE_MODE (types.first),
> + TYPE_MODE (element_type (types.second)),
> + opt_type) != CODE_FOR_nothing);
> +}
> +

…this can use convert_optab_supported_p.

> diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz.c 
> b/gcc/testsuite/gcc.target/aarch64/frintnz.c
> new file mode 100644
> index 
> ..008e1cf9f4a1b0148128c65c9ea0d1bb111467b7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/frintnz.c
> @@ -0,0 +1,91 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=armv8.5-a" } */
> +/* { dg-require-effective-target aarch64_frintnzx_ok } */

Is this just a cut-&-pasto from a run test?  If not, why do we need both
this and the dg-options?  It feels like one on its own should be enough,
with the dg-options being better.

The test looks OK without this line.

> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +/*
> +** f1:
> +**   frint32zs0, s0
> +**   ret
> +*/
> +float
> +f1 (float x)
> +{
> +  int y = x;
> +  return (float) y;
> +}
> +
> +/*
> +** f2:
> +**   frint64zs0, s0
> +**   ret
> +*/
> +float
> +f2 (float x)
> +{
> +  long long int y = x;
> +  return (float) y

Re: [AArch64] Enable generation of FRINTNZ instructions

2022-01-10 Thread Richard Biener via Gcc-patches
On Mon, 10 Jan 2022, Andre Vieira (lists) wrote:

> Yeah seems I forgot to send the latest version, my bad.
> 
> Bootstrapped on aarch64-none-linux.
> 
> OK for trunk?

The match.pd part looks OK to me.

Thanks,
Richard.

> gcc/ChangeLog:
> 
>     * config/aarch64/aarch64.md (ftrunc2): New
> pattern.
>     * config/aarch64/iterators.md (FRINTNZ): New iterator.
>     (frintnz_mode): New int attribute.
>     (VSFDF): Make iterator conditional.
>     * internal-fn.def (FTRUNC_INT): New IFN.
>     * internal-fn.c (ftrunc_int_direct): New define.
>     (expand_ftrunc_int_optab_fn): New custom expander.
>     (direct_ftrunc_int_optab_supported_p): New supported_p.
>     * match.pd: Add to the existing TRUNC pattern match.
>     * optabs.def (ftrunc_int): New entry.
>     * stor-layout.h (element_precision): Moved from here...
>     * tree.h (element_precision): ... to here.
>     (element_type): New declaration.
>     * tree.c (element_type): New function.
>     (element_precision): Changed to use element_type.
>     * tree-vect-stmts.c (vectorizable_internal_function): Add 
> support for
>     IFNs with different input types.
>     (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
>     * doc/md.texi: New entry for ftrunc pattern name.
>     * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.
> 
> gcc/testsuite/ChangeLog:
> 
>     * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz
> instruction available.
>     * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
>     * gcc.target/aarch64/frintnz.c: New test.
>     * gcc.target/aarch64/frintnz_vec.c: New test.
> 
> On 03/01/2022 12:18, Richard Biener wrote:
> > On Wed, 29 Dec 2021, Andre Vieira (lists) wrote:
> >
> >> Hi Richard,
> >>
> >> Thank you for the review, I've adopted all above suggestions downstream, I
> >> am
> >> still surprised how many style things I still miss after years of gcc
> >> development :(
> >>
> >> On 17/12/2021 12:44, Richard Sandiford wrote:
>  @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo,
>   rhs_type = unsigned_type_node;
> }
> -  int mask_opno = -1;
>  +  /* The argument that is not of the same type as the others.  */
>  +  int diff_opno = -1;
>  +  bool masked = false;
>   if (internal_fn_p (cfn))
>  -mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
>  +{
>  +  if (cfn == CFN_FTRUNC_INT)
>  +/* For FTRUNC this represents the argument that carries the 
>  type of
>  the
>  +   intermediate signed integer.  */
>  +diff_opno = 1;
>  +  else
>  +{
>  +  /* For masked operations this represents the argument that 
>  carries
>  the
>  + mask.  */
>  +  diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
>  +  masked = diff_opno >=  0;
>  +}
>  +}
> >>> I think it would be cleaner not to process argument 1 at all for
> >>> CFN_FTRUNC_INT.  There's no particular need to vectorise it.
> >> I agree with this,  will change the loop to continue for argument 1 when
> >> dealing with non-masked CFN's.
> >>
>   }
>  […]
>  diff --git a/gcc/tree.c b/gcc/tree.c
>  index
>  845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1
>  100644
>  --- a/gcc/tree.c
>  +++ b/gcc/tree.c
>  @@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size,
>  cst_size_error *perr /* = NULL */)
>   return true;
> }
> 
>  -/* Return the precision of the type, or for a complex or vector type the
>  -   precision of the type of its elements.  */
>  +/* Return the type, or for a complex or vector type the type of its
>  +   elements.  */
> -unsigned int
>  -element_precision (const_tree type)
>  +tree
>  +element_type (const_tree type)
> {
>   if (!TYPE_P (type))
> type = TREE_TYPE (type);
>  @@ -6657,7 +6657,16 @@ element_precision (const_tree type)
>   if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
> type = TREE_TYPE (type);
> -  return TYPE_PRECISION (type);
>  +  return (tree) type;
> >>> I think we should stick a const_cast in element_precision and make
> >>> element_type take a plain “type”.  As it stands element_type is an
> >>> implicit const_cast for many cases.
> >>>
> >>> Thanks,
> >> Was just curious about something here, I thought the purpose of having
> >> element_precision (before) and element_type (now) take a const_tree as an
> >> argument was to make it clear we aren't changing the input type. I
> >> understand
> >> that as it stands element_type could be an implicit const_cast (which I
> >> should
> >> be using rather than the '(tree)' cast), but that's only if 'type' is a

Re: [AArch64] Enable generation of FRINTNZ instructions

2022-01-10 Thread Andre Vieira (lists) via Gcc-patches

Yeah seems I forgot to send the latest version, my bad.

Bootstrapped on aarch64-none-linux.

OK for trunk?

gcc/ChangeLog:

    * config/aarch64/aarch64.md (ftrunc2): New 
pattern.

    * config/aarch64/iterators.md (FRINTNZ): New iterator.
    (frintnz_mode): New int attribute.
    (VSFDF): Make iterator conditional.
    * internal-fn.def (FTRUNC_INT): New IFN.
    * internal-fn.c (ftrunc_int_direct): New define.
    (expand_ftrunc_int_optab_fn): New custom expander.
    (direct_ftrunc_int_optab_supported_p): New supported_p.
    * match.pd: Add to the existing TRUNC pattern match.
    * optabs.def (ftrunc_int): New entry.
    * stor-layout.h (element_precision): Moved from here...
    * tree.h (element_precision): ... to here.
    (element_type): New declaration.
    * tree.c (element_type): New function.
    (element_precision): Changed to use element_type.
    * tree-vect-stmts.c (vectorizable_internal_function): Add 
support for

    IFNs with different input types.
    (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
    * doc/md.texi: New entry for ftrunc pattern name.
    * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.

gcc/testsuite/ChangeLog:

    * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz 
instruction available.

    * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
    * gcc.target/aarch64/frintnz.c: New test.
    * gcc.target/aarch64/frintnz_vec.c: New test.

On 03/01/2022 12:18, Richard Biener wrote:

On Wed, 29 Dec 2021, Andre Vieira (lists) wrote:


Hi Richard,

Thank you for the review, I've adopted all above suggestions downstream, I am
still surprised how many style things I still miss after years of gcc
development :(

On 17/12/2021 12:44, Richard Sandiford wrote:

@@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo,
 rhs_type = unsigned_type_node;
   }
   -  int mask_opno = -1;
+  /* The argument that is not of the same type as the others.  */
+  int diff_opno = -1;
+  bool masked = false;
 if (internal_fn_p (cfn))
-mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
+{
+  if (cfn == CFN_FTRUNC_INT)
+   /* For FTRUNC this represents the argument that carries the type of
the
+  intermediate signed integer.  */
+   diff_opno = 1;
+  else
+   {
+ /* For masked operations this represents the argument that carries
the
+mask.  */
+ diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
+ masked = diff_opno >=  0;
+   }
+}

I think it would be cleaner not to process argument 1 at all for
CFN_FTRUNC_INT.  There's no particular need to vectorise it.

I agree with this,  will change the loop to continue for argument 1 when
dealing with non-masked CFN's.


}
[…]
diff --git a/gcc/tree.c b/gcc/tree.c
index
845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1
100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size,
cst_size_error *perr /* = NULL */)
 return true;
   }
   
-/* Return the precision of the type, or for a complex or vector type the

-   precision of the type of its elements.  */
+/* Return the type, or for a complex or vector type the type of its
+   elements.  */
   -unsigned int
-element_precision (const_tree type)
+tree
+element_type (const_tree type)
   {
 if (!TYPE_P (type))
   type = TREE_TYPE (type);
@@ -6657,7 +6657,16 @@ element_precision (const_tree type)
 if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
   type = TREE_TYPE (type);
   -  return TYPE_PRECISION (type);
+  return (tree) type;

I think we should stick a const_cast in element_precision and make
element_type take a plain “type”.  As it stands element_type is an
implicit const_cast for many cases.

Thanks,

Was just curious about something here, I thought the purpose of having
element_precision (before) and element_type (now) take a const_tree as an
argument was to make it clear we aren't changing the input type. I understand
that as it stands element_type could be an implicit const_cast (which I should
be using rather than the '(tree)' cast), but that's only if 'type' is a type
that isn't complex/vector, either way, we are conforming to the promise that
we aren't changing the incoming type, what the caller then does with the
result is up to them no?

I don't mind making the changes, just trying to understand the reasoning
behind it.

I'll send in a new patch with all the changes after the review on the match.pd
stuff.

I'm missing an updated patch after my initial review of the match.pd
stuff so not sure what to review.  Can you re-post and updated patch?

Thanks,
Richard.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
3c72bdad01bfab49ee4ae6fb7b139202e89c8d34..9d04a2e088cd7d03963b58ed3708c339b841801c
 100644
--- a/gcc/config

Re: [AArch64] Enable generation of FRINTNZ instructions

2022-01-03 Thread Richard Biener via Gcc-patches
On Wed, 29 Dec 2021, Andre Vieira (lists) wrote:

> Hi Richard,
> 
> Thank you for the review, I've adopted all above suggestions downstream, I am
> still surprised how many style things I still miss after years of gcc
> development :(
> 
> On 17/12/2021 12:44, Richard Sandiford wrote:
> >
> >> @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo,
> >> rhs_type = unsigned_type_node;
> >>   }
> >>   -  int mask_opno = -1;
> >> +  /* The argument that is not of the same type as the others.  */
> >> +  int diff_opno = -1;
> >> +  bool masked = false;
> >> if (internal_fn_p (cfn))
> >> -mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
> >> +{
> >> +  if (cfn == CFN_FTRUNC_INT)
> >> +  /* For FTRUNC this represents the argument that carries the type of
> >> the
> >> + intermediate signed integer.  */
> >> +  diff_opno = 1;
> >> +  else
> >> +  {
> >> +/* For masked operations this represents the argument that carries
> >> the
> >> +   mask.  */
> >> +diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
> >> +masked = diff_opno >=  0;
> >> +  }
> >> +}
> > I think it would be cleaner not to process argument 1 at all for
> > CFN_FTRUNC_INT.  There's no particular need to vectorise it.
> 
> I agree with this,  will change the loop to continue for argument 1 when
> dealing with non-masked CFN's.
> 
> >>}
> >> […]
> >> diff --git a/gcc/tree.c b/gcc/tree.c
> >> index
> >> 845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1
> >> 100644
> >> --- a/gcc/tree.c
> >> +++ b/gcc/tree.c
> >> @@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size,
> >> cst_size_error *perr /* = NULL */)
> >> return true;
> >>   }
> >>   
> >> -/* Return the precision of the type, or for a complex or vector type the
> >> -   precision of the type of its elements.  */
> >> +/* Return the type, or for a complex or vector type the type of its
> >> +   elements.  */
> >>   -unsigned int
> >> -element_precision (const_tree type)
> >> +tree
> >> +element_type (const_tree type)
> >>   {
> >> if (!TYPE_P (type))
> >>   type = TREE_TYPE (type);
> >> @@ -6657,7 +6657,16 @@ element_precision (const_tree type)
> >> if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
> >>   type = TREE_TYPE (type);
> >>   -  return TYPE_PRECISION (type);
> >> +  return (tree) type;
> > I think we should stick a const_cast in element_precision and make
> > element_type take a plain “type”.  As it stands element_type is an
> > implicit const_cast for many cases.
> >
> > Thanks,
> Was just curious about something here, I thought the purpose of having
> element_precision (before) and element_type (now) take a const_tree as an
> argument was to make it clear we aren't changing the input type. I understand
> that as it stands element_type could be an implicit const_cast (which I should
> be using rather than the '(tree)' cast), but that's only if 'type' is a type
> that isn't complex/vector, either way, we are conforming to the promise that
> we aren't changing the incoming type, what the caller then does with the
> result is up to them no?
> 
> I don't mind making the changes, just trying to understand the reasoning
> behind it.
> 
> I'll send in a new patch with all the changes after the review on the match.pd
> stuff.

I'm missing an updated patch after my initial review of the match.pd
stuff so not sure what to review.  Can you re-post and updated patch?

Thanks,
Richard.


Re: [AArch64] Enable generation of FRINTNZ instructions

2021-12-29 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)"  writes:
> On 17/12/2021 12:44, Richard Sandiford wrote:
>>
>>> @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo,
>>> rhs_type = unsigned_type_node;
>>>   }
>>>   
>>> -  int mask_opno = -1;
>>> +  /* The argument that is not of the same type as the others.  */
>>> +  int diff_opno = -1;
>>> +  bool masked = false;
>>> if (internal_fn_p (cfn))
>>> -mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
>>> +{
>>> +  if (cfn == CFN_FTRUNC_INT)
>>> +   /* For FTRUNC this represents the argument that carries the type of the
>>> +  intermediate signed integer.  */
>>> +   diff_opno = 1;
>>> +  else
>>> +   {
>>> + /* For masked operations this represents the argument that carries the
>>> +mask.  */
>>> + diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
>>> + masked = diff_opno >=  0;
>>> +   }
>>> +}
>> I think it would be cleaner not to process argument 1 at all for
>> CFN_FTRUNC_INT.  There's no particular need to vectorise it.
>
> I agree with this,  will change the loop to continue for argument 1 when 
> dealing with non-masked CFN's.
>
>>> }
>>> […]
>>> diff --git a/gcc/tree.c b/gcc/tree.c
>>> index 
>>> 845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1
>>>  100644
>>> --- a/gcc/tree.c
>>> +++ b/gcc/tree.c
>>> @@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size, 
>>> cst_size_error *perr /* = NULL */)
>>> return true;
>>>   }
>>>   
>>> -/* Return the precision of the type, or for a complex or vector type the
>>> -   precision of the type of its elements.  */
>>> +/* Return the type, or for a complex or vector type the type of its
>>> +   elements.  */
>>>   
>>> -unsigned int
>>> -element_precision (const_tree type)
>>> +tree
>>> +element_type (const_tree type)
>>>   {
>>> if (!TYPE_P (type))
>>>   type = TREE_TYPE (type);
>>> @@ -6657,7 +6657,16 @@ element_precision (const_tree type)
>>> if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
>>>   type = TREE_TYPE (type);
>>>   
>>> -  return TYPE_PRECISION (type);
>>> +  return (tree) type;
>> I think we should stick a const_cast in element_precision and make
>> element_type take a plain “type”.  As it stands element_type is an
>> implicit const_cast for many cases.
>>
>> Thanks,
> Was just curious about something here, I thought the purpose of having 
> element_precision (before) and element_type (now) take a const_tree as 
> an argument was to make it clear we aren't changing the input type. I 
> understand that as it stands element_type could be an implicit 
> const_cast (which I should be using rather than the '(tree)' cast), but 
> that's only if 'type' is a type that isn't complex/vector, either way, 
> we are conforming to the promise that we aren't changing the incoming 
> type, what the caller then does with the result is up to them no?
>
> I don't mind making the changes, just trying to understand the reasoning 
> behind it.

The problem with the above is that functions like the following become
well-typed:

void
foo (const_tree t)
{
  TYPE_MODE (element_type (t)) = VOIDmode;
}

even though element_type (t) could well be t.

One of the points of const_tree (and const pointer targets in general)
is to use the type system to enforce the promise that the value isn't
changed.

I guess the above is similar to the traditional problem with functions
like index and strstr, which take a const char * but return a char *.
So for example:

void
foo (const char *x)
{
  index (x, '.') = 0;
}

is well-typed.  But the equivalent C++ code (using iterators) would be
rejected.  If C allowed overloading them the correct prototypes would be:

const char *index (const char *, int);
char *index (char *, int);

And I think the same applies here.  Either we should provide two functions:

const_tree element_type (const_tree);
tree element_type (tree);

or just the latter.

Thanks,
Richard


Re: [AArch64] Enable generation of FRINTNZ instructions

2021-12-29 Thread Andre Vieira (lists) via Gcc-patches

Hi Richard,

Thank you for the review, I've adopted all above suggestions downstream, 
I am still surprised how many style things I still miss after years of 
gcc development :(


On 17/12/2021 12:44, Richard Sandiford wrote:



@@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo,
rhs_type = unsigned_type_node;
  }
  
-  int mask_opno = -1;

+  /* The argument that is not of the same type as the others.  */
+  int diff_opno = -1;
+  bool masked = false;
if (internal_fn_p (cfn))
-mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
+{
+  if (cfn == CFN_FTRUNC_INT)
+   /* For FTRUNC this represents the argument that carries the type of the
+  intermediate signed integer.  */
+   diff_opno = 1;
+  else
+   {
+ /* For masked operations this represents the argument that carries the
+mask.  */
+ diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
+ masked = diff_opno >=  0;
+   }
+}

I think it would be cleaner not to process argument 1 at all for
CFN_FTRUNC_INT.  There's no particular need to vectorise it.


I agree with this,  will change the loop to continue for argument 1 when 
dealing with non-masked CFN's.



}
[…]
diff --git a/gcc/tree.c b/gcc/tree.c
index 
845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1
 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size, cst_size_error 
*perr /* = NULL */)
return true;
  }
  
-/* Return the precision of the type, or for a complex or vector type the

-   precision of the type of its elements.  */
+/* Return the type, or for a complex or vector type the type of its
+   elements.  */
  
-unsigned int

-element_precision (const_tree type)
+tree
+element_type (const_tree type)
  {
if (!TYPE_P (type))
  type = TREE_TYPE (type);
@@ -6657,7 +6657,16 @@ element_precision (const_tree type)
if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
  type = TREE_TYPE (type);
  
-  return TYPE_PRECISION (type);

+  return (tree) type;

I think we should stick a const_cast in element_precision and make
element_type take a plain “type”.  As it stands element_type is an
implicit const_cast for many cases.

Thanks,
Was just curious about something here, I thought the purpose of having 
element_precision (before) and element_type (now) take a const_tree as 
an argument was to make it clear we aren't changing the input type. I 
understand that as it stands element_type could be an implicit 
const_cast (which I should be using rather than the '(tree)' cast), but 
that's only if 'type' is a type that isn't complex/vector, either way, 
we are conforming to the promise that we aren't changing the incoming 
type, what the caller then does with the result is up to them no?


I don't mind making the changes, just trying to understand the reasoning 
behind it.


I'll send in a new patch with all the changes after the review on the 
match.pd stuff.


Thanks,
Andre



Re: [AArch64] Enable generation of FRINTNZ instructions

2021-12-17 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists) via Gcc-patches"  writes:
> On 22/11/2021 11:41, Richard Biener wrote:
>>
>>> On 18/11/2021 11:05, Richard Biener wrote:
 This is a good shout and made me think about something I hadn't before... I
 thought I could handle the vector forms later, but the problem is if I add
 support for the scalar, it will stop the vectorizer. It seems
 vectorizable_call expects all arguments to have the same type, which 
 doesn't
 work with passing the integer type as an operand work around.
>> We already special case some IFNs there (masked load/store and gather)
>> to ignore some args, so that would just add to this set.
>>
>> Richard.
> Hi,
>
> Reworked it to add support of the new IFN to the vectorizer. Was 
> initially trying to make vectorizable_call and 
> vectorizable_internal_function handle IFNs with different inputs more 
> generically, using the information we have in the _direct structs 
> regarding what operands to get the modes from. Unfortunately, that 
> wasn't straightforward because of how vectorizable_call assumes operands 
> have the same type and uses the type of the DEF_STMT_INFO of the 
> non-constant operands (either output operand or non-constant inputs) to 
> determine the type of constants. I assume there is some reason why we 
> use the DEF_STMT_INFO and not always use get_vectype_for_scalar_type on 
> the argument types. That is why I ended up with this sort of half-way 
> mix of both, which still allows room to add more IFNs that don't take 
> inputs of the same type, but require adding a bit of special casing 
> similar to the IFN_FTRUNC_INT and masking ones.
>
> Bootstrapped on aarch64-none-linux.

Still leaving the match.pd stuff to Richard, but otherwise:

> index 
> bdc8ba3576cf2c9b4ae96b45a382234e4e25b13f..51f00344b02d0d1d4adf97463f6a46f9fd0fb43f
>  100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -160,7 +160,11 @@ (define_mode_iterator VHSDF_HSDF [(V4HF 
> "TARGET_SIMD_F16INST")
> SF DF])
>  
>  ;; Scalar and vetor modes for SF, DF.
> -(define_mode_iterator VSFDF [V2SF V4SF V2DF DF SF])
> +(define_mode_iterator VSFDF [ (V2SF "TARGET_SIMD")

Nit: excess space between [ and (.

> +   (V4SF "TARGET_SIMD")
> +   (V2DF "TARGET_SIMD")
> +   (DF "TARGET_FLOAT")
> +   (SF "TARGET_FLOAT")])
>  
>  ;; Advanced SIMD single Float modes.
>  (define_mode_iterator VDQSF [V2SF V4SF])
> […]
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 
> 41f1850bf6e95005647ca97a495a97d7e184d137..d50d09b0ae60d98537b9aece4396a490f33f174c
>  100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6175,6 +6175,15 @@ operands; otherwise, it may not.
>  
>  This pattern is not allowed to @code{FAIL}.
>  
> +@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern
> +@item @samp{ftrunc@var{m}@var{n}2}
> +Truncate operand 1 to a @var{n} mode signed integer, towards zero, and store
> +the result in operand 0. Both operands have mode @var{m}, which is a scalar 
> or
> +vector floating-point mode.  Exception must be thrown if operand 1 does not 
> fit

Maybe “An exception must be raised”?  “thrown” makes it sound like a
signal must be raised or C++ exception thrown.

> +in a @var{n} mode signed integer as it would have if the truncation happened
> +through separate floating point to integer conversion.
> +
> +
>  @cindex @code{round@var{m}2} instruction pattern
>  @item @samp{round@var{m}2}
>  Round operand 1 to the nearest integer, rounding away from zero in the
> […]
> @@ -3688,6 +3712,15 @@ multi_vector_optab_supported_p (convert_optab optab, 
> tree_pair types,
> != CODE_FOR_nothing);
>  }
>  
> +static bool direct_ftrunc_int_optab_supported_p (convert_optab optab,
> +  tree_pair types,
> +  optimization_type opt_type)

Formatting nit: should be a line break after “bool”

> +{
> +  return (convert_optab_handler (optab, TYPE_MODE (types.first),
> + TYPE_MODE (element_type (types.second)),
> + opt_type) != CODE_FOR_nothing);
> +}
> +
>  #define direct_unary_optab_supported_p direct_optab_supported_p
>  #define direct_binary_optab_supported_p direct_optab_supported_p
>  #define direct_ternary_optab_supported_p direct_optab_supported_p
> […]
> diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c 
> b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
> new file mode 100644
> index 
> ..b93304eb2acb3d3d954eebee51d77ff23fee68ac
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
> @@ -0,0 +1,47 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -march=armv8.5-a" } */
> +/* { dg-require-effective-target aarch64_frintnzx_ok } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +

Re: [AArch64] Enable generation of FRINTNZ instructions

2021-12-07 Thread Andre Vieira (lists) via Gcc-patches

ping

On 25/11/2021 13:53, Andre Vieira (lists) via Gcc-patches wrote:


On 22/11/2021 11:41, Richard Biener wrote:



On 18/11/2021 11:05, Richard Biener wrote:
This is a good shout and made me think about something I hadn't 
before... I
thought I could handle the vector forms later, but the problem is 
if I add

support for the scalar, it will stop the vectorizer. It seems
vectorizable_call expects all arguments to have the same type, 
which doesn't

work with passing the integer type as an operand work around.

We already special case some IFNs there (masked load/store and gather)
to ignore some args, so that would just add to this set.

Richard.

Hi,

Reworked it to add support of the new IFN to the vectorizer. Was 
initially trying to make vectorizable_call and 
vectorizable_internal_function handle IFNs with different inputs more 
generically, using the information we have in the _direct structs 
regarding what operands to get the modes from. Unfortunately, that 
wasn't straightforward because of how vectorizable_call assumes 
operands have the same type and uses the type of the DEF_STMT_INFO of 
the non-constant operands (either output operand or non-constant 
inputs) to determine the type of constants. I assume there is some 
reason why we use the DEF_STMT_INFO and not always use 
get_vectype_for_scalar_type on the argument types. That is why I ended 
up with this sort of half-way mix of both, which still allows room to 
add more IFNs that don't take inputs of the same type, but require 
adding a bit of special casing similar to the IFN_FTRUNC_INT and 
masking ones.


Bootstrapped on aarch64-none-linux.

OK for trunk?

gcc/ChangeLog:

    * config/aarch64/aarch64.md (ftrunc2): New 
pattern.

    * config/aarch64/iterators.md (FRINTNZ): New iterator.
    (frintnz_mode): New int attribute.
    (VSFDF): Make iterator conditional.
    * internal-fn.def (FTRUNC_INT): New IFN.
    * internal-fn.c (ftrunc_int_direct): New define.
    (expand_ftrunc_int_optab_fn): New custom expander.
    (direct_ftrunc_int_optab_supported_p): New supported_p.
    * match.pd: Add to the existing TRUNC pattern match.
    * optabs.def (ftrunc_int): New entry.
    * stor-layout.h (element_precision): Moved from here...
    * tree.h (element_precision): ... to here.
    (element_type): New declaration.
    * tree.c (element_type): New function.
    (element_precision): Changed to use element_type.
    * tree-vect-stmts.c (vectorizable_internal_function): Add 
support for

    IFNs with different input types.
    (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
    * doc/md.texi: New entry for ftrunc pattern name.
    * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.

gcc/testsuite/ChangeLog:

    * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if 
frintNz instruction available.

    * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
    * gcc.target/aarch64/frintnz.c: New test.
    * gcc.target/aarch64/frintnz_vec.c: New test.


Re: [AArch64] Enable generation of FRINTNZ instructions

2021-11-29 Thread Andre Vieira (lists) via Gcc-patches



On 18/11/2021 11:05, Richard Biener wrote:


+ (if (!flag_trapping_math
+ && direct_internal_fn_supported_p (IFN_TRUNC, type,
+OPTIMIZE_FOR_BOTH))
+  (IFN_TRUNC @0)
  #endif

does IFN_FTRUNC_INT preserve the same exceptions as doing
explicit intermediate float->int conversions?  I think I'd
prefer to have !flag_trapping_math on both cases.
I realized I never responded to this. The AArch64 instructions mimic the 
behaviour you'd see if you were doing explicit conversions, so I'll be 
defining the new IFN and optab to require the same, such that these can 
be used by the compiler when flag_trapping_math. In the patch I sent 
last I added some likes to the md.texi description of the optab to that 
intent.


Re: [AArch64] Enable generation of FRINTNZ instructions

2021-11-25 Thread Andre Vieira (lists) via Gcc-patches


On 22/11/2021 11:41, Richard Biener wrote:



On 18/11/2021 11:05, Richard Biener wrote:

This is a good shout and made me think about something I hadn't before... I
thought I could handle the vector forms later, but the problem is if I add
support for the scalar, it will stop the vectorizer. It seems
vectorizable_call expects all arguments to have the same type, which doesn't
work with passing the integer type as an operand work around.

We already special case some IFNs there (masked load/store and gather)
to ignore some args, so that would just add to this set.

Richard.

Hi,

Reworked it to add support of the new IFN to the vectorizer. Was 
initially trying to make vectorizable_call and 
vectorizable_internal_function handle IFNs with different inputs more 
generically, using the information we have in the _direct structs 
regarding what operands to get the modes from. Unfortunately, that 
wasn't straightforward because of how vectorizable_call assumes operands 
have the same type and uses the type of the DEF_STMT_INFO of the 
non-constant operands (either output operand or non-constant inputs) to 
determine the type of constants. I assume there is some reason why we 
use the DEF_STMT_INFO and not always use get_vectype_for_scalar_type on 
the argument types. That is why I ended up with this sort of half-way 
mix of both, which still allows room to add more IFNs that don't take 
inputs of the same type, but require adding a bit of special casing 
similar to the IFN_FTRUNC_INT and masking ones.


Bootstrapped on aarch64-none-linux.

OK for trunk?

gcc/ChangeLog:

    * config/aarch64/aarch64.md (ftrunc2): New 
pattern.

    * config/aarch64/iterators.md (FRINTNZ): New iterator.
    (frintnz_mode): New int attribute.
    (VSFDF): Make iterator conditional.
    * internal-fn.def (FTRUNC_INT): New IFN.
    * internal-fn.c (ftrunc_int_direct): New define.
    (expand_ftrunc_int_optab_fn): New custom expander.
    (direct_ftrunc_int_optab_supported_p): New supported_p.
    * match.pd: Add to the existing TRUNC pattern match.
    * optabs.def (ftrunc_int): New entry.
    * stor-layout.h (element_precision): Moved from here...
    * tree.h (element_precision): ... to here.
    (element_type): New declaration.
    * tree.c (element_type): New function.
    (element_precision): Changed to use element_type.
    * tree-vect-stmts.c (vectorizable_internal_function): Add 
support for

    IFNs with different input types.
    (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
    * doc/md.texi: New entry for ftrunc pattern name.
    * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.

gcc/testsuite/ChangeLog:

    * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz 
instruction available.

    * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
    * gcc.target/aarch64/frintnz.c: New test.
    * gcc.target/aarch64/frintnz_vec.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
4035e061706793849c68ae09bcb2e4b9580ab7b6..c5c60e7a810e22b0ea9ed6bf056ddd6431d60269
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7345,12 +7345,18 @@ (define_insn "despeculate_simpleti"
(set_attr "speculation_barrier" "true")]
 )
 
+(define_expand "ftrunc2"
+  [(set (match_operand:VSFDF 0 "register_operand" "=w")
+(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
+ FRINTNZ))]
+  "TARGET_FRINT"
+)
+
 (define_insn "aarch64_"
   [(set (match_operand:VSFDF 0 "register_operand" "=w")
(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
  FRINTNZX))]
-  "TARGET_FRINT && TARGET_FLOAT
-   && !(VECTOR_MODE_P (mode) && !TARGET_SIMD)"
+  "TARGET_FRINT"
   "\\t%0, %1"
   [(set_attr "type" "f_rint")]
 )
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 
bdc8ba3576cf2c9b4ae96b45a382234e4e25b13f..51f00344b02d0d1d4adf97463f6a46f9fd0fb43f
 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -160,7 +160,11 @@ (define_mode_iterator VHSDF_HSDF [(V4HF 
"TARGET_SIMD_F16INST")
  SF DF])
 
 ;; Scalar and vetor modes for SF, DF.
-(define_mode_iterator VSFDF [V2SF V4SF V2DF DF SF])
+(define_mode_iterator VSFDF [ (V2SF "TARGET_SIMD")
+ (V4SF "TARGET_SIMD")
+ (V2DF "TARGET_SIMD")
+ (DF "TARGET_FLOAT")
+ (SF "TARGET_FLOAT")])
 
 ;; Advanced SIMD single Float modes.
 (define_mode_iterator VDQSF [V2SF V4SF])
@@ -3067,6 +3071,8 @@ (define_int_iterator FCMLA [UNSPEC_FCMLA
 (define_int_iterator FRINTNZX [UNSPEC_FRINT32Z UNSPEC_FRINT32X
   UNSPEC_FRINT64Z UNSPEC_FRINT64X])
 
+(define_int_iterator FRINTNZ [UNSPEC_FRINT32Z UNSPEC_FRINT64Z])
+
 (define_int_iter

Re: [AArch64] Enable generation of FRINTNZ instructions

2021-11-22 Thread Richard Biener via Gcc-patches
On Mon, 22 Nov 2021, Andre Vieira (lists) wrote:

> 
> On 18/11/2021 11:05, Richard Biener wrote:
> >
> > @@ -3713,12 +3713,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  trapping behaviour, so require !flag_trapping_math. */
> >   #if GIMPLE
> >   (simplify
> > -   (float (fix_trunc @0))
> > -   (if (!flag_trapping_math
> > -   && types_match (type, TREE_TYPE (@0))
> > -   && direct_internal_fn_supported_p (IFN_TRUNC, type,
> > - OPTIMIZE_FOR_BOTH))
> > -  (IFN_TRUNC @0)))
> > +   (float (fix_trunc@1 @0))
> > +   (if (types_match (type, TREE_TYPE (@0)))
> > +(if (TYPE_SIGN (TREE_TYPE (@1)) == SIGNED
> > +&& direct_internal_fn_supported_p (IFN_FTRUNC_INT, type,
> > +   TREE_TYPE (@1),
> > OPTIMIZE_FOR_BOTH))
> > + (with {
> > +  tree int_type = TREE_TYPE (@1);
> > +  unsigned HOST_WIDE_INT max_int_c
> > +   = (1ULL << (element_precision (int_type) - 1)) - 1;
> >
> > That's only half-way supporting vector types I fear - you use
> > element_precision but then build a vector integer constant
> > in an unsupported way.  I suppose vector support isn't present
> > for arm?  The cleanest way would probably be to do
> >
> > tree int_type = element_type (@1);
> >
> > with providing element_type in tree.[ch] like we provide
> > element_precision.
> This is a good shout and made me think about something I hadn't before... I
> thought I could handle the vector forms later, but the problem is if I add
> support for the scalar, it will stop the vectorizer. It seems
> vectorizable_call expects all arguments to have the same type, which doesn't
> work with passing the integer type as an operand work around.

We already special case some IFNs there (masked load/store and gather)
to ignore some args, so that would just add to this set.

Richard.


Re: [AArch64] Enable generation of FRINTNZ instructions

2021-11-22 Thread Andre Vieira (lists) via Gcc-patches



On 18/11/2021 11:05, Richard Biener wrote:


@@ -3713,12 +3713,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 trapping behaviour, so require !flag_trapping_math. */
  #if GIMPLE
  (simplify
-   (float (fix_trunc @0))
-   (if (!flag_trapping_math
-   && types_match (type, TREE_TYPE (@0))
-   && direct_internal_fn_supported_p (IFN_TRUNC, type,
- OPTIMIZE_FOR_BOTH))
-  (IFN_TRUNC @0)))
+   (float (fix_trunc@1 @0))
+   (if (types_match (type, TREE_TYPE (@0)))
+(if (TYPE_SIGN (TREE_TYPE (@1)) == SIGNED
+&& direct_internal_fn_supported_p (IFN_FTRUNC_INT, type,
+   TREE_TYPE (@1),
OPTIMIZE_FOR_BOTH))
+ (with {
+  tree int_type = TREE_TYPE (@1);
+  unsigned HOST_WIDE_INT max_int_c
+   = (1ULL << (element_precision (int_type) - 1)) - 1;

That's only half-way supporting vector types I fear - you use
element_precision but then build a vector integer constant
in an unsupported way.  I suppose vector support isn't present
for arm?  The cleanest way would probably be to do

tree int_type = element_type (@1);

with providing element_type in tree.[ch] like we provide
element_precision.
This is a good shout and made me think about something I hadn't 
before... I thought I could handle the vector forms later, but the 
problem is if I add support for the scalar, it will stop the vectorizer. 
It seems vectorizable_call expects all arguments to have the same type, 
which doesn't work with passing the integer type as an operand work around.


Should I go back to two separate IFN's, could still have the single optab.

Regards,
Andre



Re: [AArch64] Enable generation of FRINTNZ instructions

2021-11-18 Thread Richard Biener via Gcc-patches
On Wed, 17 Nov 2021, Andre Vieira (lists) wrote:

> 
> On 16/11/2021 12:10, Richard Biener wrote:
> > On Fri, 12 Nov 2021, Andre Simoes Dias Vieira wrote:
> >
> >> On 12/11/2021 10:56, Richard Biener wrote:
> >>> On Thu, 11 Nov 2021, Andre Vieira (lists) wrote:
> >>>
>  Hi,
> 
>  This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding
>  optabs and mappings. It also creates a backend pattern to implement them
>  for
>  aarch64 and a match.pd pattern to idiom recognize these.
>  These IFN's (and optabs) represent a truncation towards zero, as if
>  performed
>  by first casting it to a signed integer of 32 or 64 bits and then back to
>  the
>  same floating point type/mode.
> 
>  The match.pd pattern choses to use these, when supported, regardless of
>  trapping math, since these new patterns mimic the original behavior of
>  truncating through an integer.
> 
>  I didn't think any of the existing IFN's represented these. I know it's a
>  bit
>  late in stage 1, but I thought this might be OK given it's only used by a
>  single target and should have very little impact on anything else.
> 
>  Bootstrapped on aarch64-none-linux.
> 
>  OK for trunk?
> >>> On the RTL side ftrunc32/ftrunc64 would probably be better a conversion
> >>> optab (with two modes), so not
> >>>
> >>> +OPTAB_D (ftrunc32_optab, "ftrunc$asi2")
> >>> +OPTAB_D (ftrunc64_optab, "ftrunc$adi2")
> >>>
> >>> but
> >>>
> >>> OPTAB_CD (ftrunc_shrt_optab, "ftrunc$a$I$b2")
> >>>
> >>> or so?  I know that gets somewhat awkward for the internal function,
> >>> but IMHO we shouldn't tie our hands because of that?
> >> I tried doing this originally, but indeed I couldn't find a way to
> >> correctly
> >> tie the internal function to it.
> >>
> >> direct_optab_supported_p with multiple types expect those to be of the same
> >> mode. I see convert_optab_supported_p does but I don't know how that is
> >> used...
> >>
> >> Any ideas?
> > No "nice" ones.  The "usual" way is to provide fake arguments that
> > specify the type/mode.  We could use an integer argument directly
> > secifying the mode (then the IL would look host dependent - ugh),
> > or specify a constant zero in the intended mode (less visibly
> > obvious - but at least with -gimple dumping you'd see the type...).
> Hi,
> 
> So I reworked this to have a single optab and IFN. This required a bit of
> fiddling with custom expander and supported_p functions for the IFN. I decided
> to pass a MAX_INT for the 'int' type to the IFN to be able to pass on the size
> of the int we use as an intermediate cast.  I tried 0 first, but gcc was being
> too smart and just demoted it to an 'int' for the long long test-cases.
> 
> Bootstrapped on aarch64-none-linux.
> 
> OK for trunk?

@@ -3713,12 +3713,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
trapping behaviour, so require !flag_trapping_math. */
 #if GIMPLE
 (simplify
-   (float (fix_trunc @0))
-   (if (!flag_trapping_math
-   && types_match (type, TREE_TYPE (@0))
-   && direct_internal_fn_supported_p (IFN_TRUNC, type,
- OPTIMIZE_FOR_BOTH))
-  (IFN_TRUNC @0)))
+   (float (fix_trunc@1 @0))
+   (if (types_match (type, TREE_TYPE (@0)))
+(if (TYPE_SIGN (TREE_TYPE (@1)) == SIGNED
+&& direct_internal_fn_supported_p (IFN_FTRUNC_INT, type,
+   TREE_TYPE (@1),
OPTIMIZE_FOR_BOTH))
+ (with {
+  tree int_type = TREE_TYPE (@1);
+  unsigned HOST_WIDE_INT max_int_c
+   = (1ULL << (element_precision (int_type) - 1)) - 1;

That's only half-way supporting vector types I fear - you use
element_precision but then build a vector integer constant
in an unsupported way.  I suppose vector support isn't present
for arm?  The cleanest way would probably be to do

   tree int_type = element_type (@1);

with providing element_type in tree.[ch] like we provide
element_precision.

+  }
+  (IFN_FTRUNC_INT @0 { build_int_cst (int_type, max_int_c); }))

Then you could use wide_int_to_tree (int_type, wi::max_value 
(TYPE_PRECISION (int_type), SIGNED))
to build the special integer constant (which seems to be always
scalar).

+ (if (!flag_trapping_math
+ && direct_internal_fn_supported_p (IFN_TRUNC, type,
+OPTIMIZE_FOR_BOTH))
+  (IFN_TRUNC @0)
 #endif

does IFN_FTRUNC_INT preserve the same exceptions as doing
explicit intermediate float->int conversions?  I think I'd
prefer to have !flag_trapping_math on both cases.

> gcc/ChangeLog:
> 
>     * config/aarch64/aarch64.md (ftrunc2): New
> pattern.
>     * config/aarch64/iterators.md (FRINTZ): New iterator.
>     * doc/md.texi: New entry for ftrunc pattern name.
>     * internal-fn.def (FTRUNC_INT): New IFN.
>     * match.pd: Add to the existing TRUNC pattern match.
>     * optabs.def (ftrunc_int): New entry.
> 
> gcc

Re: [AArch64] Enable generation of FRINTNZ instructions

2021-11-17 Thread Richard Sandiford via Gcc-patches
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> 4035e061706793849c68ae09bcb2e4b9580ab7b6..62adbc4cb6bbbe0c856f9fbe451aee08f2dea3b5
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -7345,6 +7345,14 @@ (define_insn "despeculate_simpleti"
> (set_attr "speculation_barrier" "true")]
>  )
>  
> +(define_expand "ftrunc2"
> +  [(set (match_operand:VSFDF 0 "register_operand" "=w")
> +(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
> +   FRINTNZ))]
> +  "TARGET_FRINT && TARGET_FLOAT
> +   && !(VECTOR_MODE_P (mode) && !TARGET_SIMD)"
> +)

Probably just me, but this condition seems quite hard to read.
I think it'd be better to add conditions to the VSFDF definition instead,
a bit like we do for the HF entries in VHSDF_HSDF and VHSDF_DF.  I.e.:

(define_mode_iterator VSFDF [(V2SF "TARGET_SIMD")
 (V4SF "TARGET_SIMD")
 (V2DF "TARGET_SIMD")
 (SF "TARGET_FLOAT")
 (DF "TARGET_FLOAT")])

Then the condition can be "TARGET_FRINT".

Same for the existing aarch64_.

> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 
> bb13c6cce1bf55633760bc14980402f1f0ac1689..fb97d37cecae17cdb6444e7f3391361b214f0712
>  100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -269,6 +269,7 @@ DEF_INTERNAL_FLT_FLOATN_FN (RINT, ECF_CONST, rint, unary)
>  DEF_INTERNAL_FLT_FLOATN_FN (ROUND, ECF_CONST, round, unary)
>  DEF_INTERNAL_FLT_FLOATN_FN (ROUNDEVEN, ECF_CONST, roundeven, unary)
>  DEF_INTERNAL_FLT_FLOATN_FN (TRUNC, ECF_CONST, btrunc, unary)
> +DEF_INTERNAL_OPTAB_FN (FTRUNC_INT, ECF_CONST, ftruncint, ftrunc_int)

ftrunc_int should be described in the comment at the top of the file.
E.g.:

  - ftrunc_int: a unary conversion optab that takes and returns values
of the same mode, but internally converts via another mode.  This
second mode is specified using a dummy final function argument.

> diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz.c 
> b/gcc/testsuite/gcc.target/aarch64/frintnz.c
> new file mode 100644
> index 
> ..2e1971f8aa11d8b95f454d03a03e050a3bf96747
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/frintnz.c
> @@ -0,0 +1,88 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=armv8.5-a" } */
> +/* { dg-require-effective-target arm_v8_5a_frintnzx_ok } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +/*
> +** f1:
> +**   ...
> +**   frint32zs0, s0
> +**   ...

Are these functions ever more than just:

f1:
frint32zs0, s0
ret

?  If not, I think we should match that sequence and “defend” the
good codegen.  The problem with ... on both sides is that it's
then not clear why we can rely on register 0 being used.

> +*/
> +float
> +f1 (float x)
> +{
> +  int y = x;
> +  return (float) y;
> +}
> +
> +/*
> +** f2:
> +**   ...
> +**   frint64zs0, s0
> +**   ...
> +*/
> +float
> +f2 (float x)
> +{
> +  long long int y = x;
> +  return (float) y;
> +}
> +
> +/*
> +** f3:
> +**   ...
> +**   frint32zd0, d0
> +**   ...
> +*/
> +double
> +f3 (double x)
> +{
> +  int y = x;
> +  return (double) y;
> +}
> +
> +/*
> +** f4:
> +**   ...
> +**   frint64zd0, d0
> +**   ...
> +*/
> +double
> +f4 (double x)
> +{
> +  long long int y = x;
> +  return (double) y;
> +}
> +
> +float
> +f1_dont (float x)
> +{
> +  unsigned int y = x;
> +  return (float) y;
> +}
> +
> +float
> +f2_dont (float x)
> +{
> +  unsigned long long int y = x;
> +  return (float) y;
> +}
> +
> +double
> +f3_dont (double x)
> +{
> +  unsigned int y = x;
> +  return (double) y;
> +}
> +
> +double
> +f4_dont (double x)
> +{
> +  unsigned long long int y = x;
> +  return (double) y;
> +}
> +
> +/* Make sure the 'dont's don't generate any frintNz.  */
> +/* { dg-final { scan-assembler-times {frint32z} 2 } } */
> +/* { dg-final { scan-assembler-times {frint64z} 2 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c 
> b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
> index 
> 07217064e2ba54fcf4f5edc440e6ec19ddae66e1..3b34dc3ad79f1406a41ec4c00db10347ba1ca2c4
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -ffast-math" } */
> +/* { dg-skip-if "" { arm_v8_5a_frintnzx_ok } } */
>  
>  float
>  f1 (float x)
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 
> 8cbda192fe0fae59ea208ee43696b4d22c43e61e..7fa1659ce734257f3cd96f1e2e50ace4d02dcf51
>  100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -11365,6 +11365,33 @@ proc check_effective_target_arm_v8_3a_bkey_directive 
> { } {
>   }]
>  }
>  
> +# Return 1 if the target supports ARMv8.

Re: [AArch64] Enable generation of FRINTNZ instructions

2021-11-17 Thread Andre Vieira (lists) via Gcc-patches


On 16/11/2021 12:10, Richard Biener wrote:

On Fri, 12 Nov 2021, Andre Simoes Dias Vieira wrote:


On 12/11/2021 10:56, Richard Biener wrote:

On Thu, 11 Nov 2021, Andre Vieira (lists) wrote:


Hi,

This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding
optabs and mappings. It also creates a backend pattern to implement them
for
aarch64 and a match.pd pattern to idiom recognize these.
These IFN's (and optabs) represent a truncation towards zero, as if
performed
by first casting it to a signed integer of 32 or 64 bits and then back to
the
same floating point type/mode.

The match.pd pattern choses to use these, when supported, regardless of
trapping math, since these new patterns mimic the original behavior of
truncating through an integer.

I didn't think any of the existing IFN's represented these. I know it's a
bit
late in stage 1, but I thought this might be OK given it's only used by a
single target and should have very little impact on anything else.

Bootstrapped on aarch64-none-linux.

OK for trunk?

On the RTL side ftrunc32/ftrunc64 would probably be better a conversion
optab (with two modes), so not

+OPTAB_D (ftrunc32_optab, "ftrunc$asi2")
+OPTAB_D (ftrunc64_optab, "ftrunc$adi2")

but

OPTAB_CD (ftrunc_shrt_optab, "ftrunc$a$I$b2")

or so?  I know that gets somewhat awkward for the internal function,
but IMHO we shouldn't tie our hands because of that?

I tried doing this originally, but indeed I couldn't find a way to correctly
tie the internal function to it.

direct_optab_supported_p with multiple types expect those to be of the same
mode. I see convert_optab_supported_p does but I don't know how that is
used...

Any ideas?

No "nice" ones.  The "usual" way is to provide fake arguments that
specify the type/mode.  We could use an integer argument directly
secifying the mode (then the IL would look host dependent - ugh),
or specify a constant zero in the intended mode (less visibly
obvious - but at least with -gimple dumping you'd see the type...).

Hi,

So I reworked this to have a single optab and IFN. This required a bit 
of fiddling with custom expander and supported_p functions for the IFN. 
I decided to pass a MAX_INT for the 'int' type to the IFN to be able to 
pass on the size of the int we use as an intermediate cast.  I tried 0 
first, but gcc was being too smart and just demoted it to an 'int' for 
the long long test-cases.


Bootstrapped on aarch64-none-linux.

OK for trunk?

gcc/ChangeLog:

    * config/aarch64/aarch64.md (ftrunc2): New 
pattern.

    * config/aarch64/iterators.md (FRINTZ): New iterator.
    * doc/md.texi: New entry for ftrunc pattern name.
    * internal-fn.def (FTRUNC_INT): New IFN.
    * match.pd: Add to the existing TRUNC pattern match.
    * optabs.def (ftrunc_int): New entry.

gcc/testsuite/ChangeLog:

    * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz 
instruction available.

    * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
    * gcc.target/aarch64/frintnz.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
4035e061706793849c68ae09bcb2e4b9580ab7b6..62adbc4cb6bbbe0c856f9fbe451aee08f2dea3b5
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7345,6 +7345,14 @@ (define_insn "despeculate_simpleti"
(set_attr "speculation_barrier" "true")]
 )
 
+(define_expand "ftrunc2"
+  [(set (match_operand:VSFDF 0 "register_operand" "=w")
+(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
+ FRINTNZ))]
+  "TARGET_FRINT && TARGET_FLOAT
+   && !(VECTOR_MODE_P (mode) && !TARGET_SIMD)"
+)
+
 (define_insn "aarch64_"
   [(set (match_operand:VSFDF 0 "register_operand" "=w")
(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 
bdc8ba3576cf2c9b4ae96b45a382234e4e25b13f..49510488a2a800689e95c399f2e6c967b566516d
 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -3067,6 +3067,8 @@ (define_int_iterator FCMLA [UNSPEC_FCMLA
 (define_int_iterator FRINTNZX [UNSPEC_FRINT32Z UNSPEC_FRINT32X
   UNSPEC_FRINT64Z UNSPEC_FRINT64X])
 
+(define_int_iterator FRINTNZ [UNSPEC_FRINT32Z UNSPEC_FRINT64Z])
+
 (define_int_iterator SVE_BRK_UNARY [UNSPEC_BRKA UNSPEC_BRKB])
 
 (define_int_iterator SVE_BRK_BINARY [UNSPEC_BRKN UNSPEC_BRKPA UNSPEC_BRKPB])
@@ -3482,6 +3484,8 @@ (define_int_attr f16mac1 [(UNSPEC_FMLAL "a") 
(UNSPEC_FMLSL "s")
 (define_int_attr frintnzs_op [(UNSPEC_FRINT32Z "frint32z") (UNSPEC_FRINT32X 
"frint32x")
  (UNSPEC_FRINT64Z "frint64z") (UNSPEC_FRINT64X 
"frint64x")])
 
+(define_int_attr frintnz_mode [(UNSPEC_FRINT32Z "si") (UNSPEC_FRINT64Z "di")])
+
 ;; The condition associated with an UNSPEC_COND_.
 (define_int_attr cmp_op [(UNSPEC_COND_CMPEQ_WIDE "eq")
 (UNSPEC_COND_CMPG

Re: [AArch64] Enable generation of FRINTNZ instructions

2021-11-16 Thread Richard Biener via Gcc-patches
On Fri, 12 Nov 2021, Andre Simoes Dias Vieira wrote:

> 
> On 12/11/2021 10:56, Richard Biener wrote:
> > On Thu, 11 Nov 2021, Andre Vieira (lists) wrote:
> >
> >> Hi,
> >>
> >> This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding
> >> optabs and mappings. It also creates a backend pattern to implement them
> >> for
> >> aarch64 and a match.pd pattern to idiom recognize these.
> >> These IFN's (and optabs) represent a truncation towards zero, as if
> >> performed
> >> by first casting it to a signed integer of 32 or 64 bits and then back to
> >> the
> >> same floating point type/mode.
> >>
> >> The match.pd pattern choses to use these, when supported, regardless of
> >> trapping math, since these new patterns mimic the original behavior of
> >> truncating through an integer.
> >>
> >> I didn't think any of the existing IFN's represented these. I know it's a
> >> bit
> >> late in stage 1, but I thought this might be OK given it's only used by a
> >> single target and should have very little impact on anything else.
> >>
> >> Bootstrapped on aarch64-none-linux.
> >>
> >> OK for trunk?
> > On the RTL side ftrunc32/ftrunc64 would probably be better a conversion
> > optab (with two modes), so not
> >
> > +OPTAB_D (ftrunc32_optab, "ftrunc$asi2")
> > +OPTAB_D (ftrunc64_optab, "ftrunc$adi2")
> >
> > but
> >
> > OPTAB_CD (ftrunc_shrt_optab, "ftrunc$a$I$b2")
> >
> > or so?  I know that gets somewhat awkward for the internal function,
> > but IMHO we shouldn't tie our hands because of that?
> I tried doing this originally, but indeed I couldn't find a way to correctly
> tie the internal function to it.
> 
> direct_optab_supported_p with multiple types expect those to be of the same
> mode. I see convert_optab_supported_p does but I don't know how that is
> used...
> 
> Any ideas?

No "nice" ones.  The "usual" way is to provide fake arguments that
specify the type/mode.  We could use an integer argument directly
secifying the mode (then the IL would look host dependent - ugh),
or specify a constant zero in the intended mode (less visibly
obvious - but at least with -gimple dumping you'd see the type...).

In any case if people think going with two optabs is OK then
please consider using ftruncsi and ftruncdi instead of 32/64.

Richard.


Re: [AArch64] Enable generation of FRINTNZ instructions

2021-11-12 Thread Andre Simoes Dias Vieira via Gcc-patches



On 12/11/2021 10:56, Richard Biener wrote:

On Thu, 11 Nov 2021, Andre Vieira (lists) wrote:


Hi,

This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding
optabs and mappings. It also creates a backend pattern to implement them for
aarch64 and a match.pd pattern to idiom recognize these.
These IFN's (and optabs) represent a truncation towards zero, as if performed
by first casting it to a signed integer of 32 or 64 bits and then back to the
same floating point type/mode.

The match.pd pattern choses to use these, when supported, regardless of
trapping math, since these new patterns mimic the original behavior of
truncating through an integer.

I didn't think any of the existing IFN's represented these. I know it's a bit
late in stage 1, but I thought this might be OK given it's only used by a
single target and should have very little impact on anything else.

Bootstrapped on aarch64-none-linux.

OK for trunk?

On the RTL side ftrunc32/ftrunc64 would probably be better a conversion
optab (with two modes), so not

+OPTAB_D (ftrunc32_optab, "ftrunc$asi2")
+OPTAB_D (ftrunc64_optab, "ftrunc$adi2")

but

OPTAB_CD (ftrunc_shrt_optab, "ftrunc$a$I$b2")

or so?  I know that gets somewhat awkward for the internal function,
but IMHO we shouldn't tie our hands because of that?
I tried doing this originally, but indeed I couldn't find a way to 
correctly tie the internal function to it.


direct_optab_supported_p with multiple types expect those to be of the 
same mode. I see convert_optab_supported_p does but I don't know how 
that is used...


Any ideas?



Re: [AArch64] Enable generation of FRINTNZ instructions

2021-11-12 Thread Richard Biener via Gcc-patches
On Thu, 11 Nov 2021, Andre Vieira (lists) wrote:

> Hi,
> 
> This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding
> optabs and mappings. It also creates a backend pattern to implement them for
> aarch64 and a match.pd pattern to idiom recognize these.
> These IFN's (and optabs) represent a truncation towards zero, as if performed
> by first casting it to a signed integer of 32 or 64 bits and then back to the
> same floating point type/mode.
> 
> The match.pd pattern choses to use these, when supported, regardless of
> trapping math, since these new patterns mimic the original behavior of
> truncating through an integer.
> 
> I didn't think any of the existing IFN's represented these. I know it's a bit
> late in stage 1, but I thought this might be OK given it's only used by a
> single target and should have very little impact on anything else.
> 
> Bootstrapped on aarch64-none-linux.
> 
> OK for trunk?

On the RTL side ftrunc32/ftrunc64 would probably be better a conversion
optab (with two modes), so not

+OPTAB_D (ftrunc32_optab, "ftrunc$asi2")
+OPTAB_D (ftrunc64_optab, "ftrunc$adi2")

but

OPTAB_CD (ftrunc_shrt_optab, "ftrunc$a$I$b2")

or so?  I know that gets somewhat awkward for the internal function,
but IMHO we shouldn't tie our hands because of that?

Richard.


> gcc/ChangeLog:
> 
>     * config/aarch64/aarch64.md (ftrunc2): New
> pattern.
>     * config/aarch64/iterators.md (FRINTZ): New iterator.
>     * doc/md.texi: New entry for ftrunc pattern name.
>     * internal-fn.def (FTRUNC32): New IFN.
>     (FTRUNC64): Likewise.
>     * match.pd: Add to the existing TRUNC pattern match.
>     * optabs.def (OPTAB_D): New entries for ftrunc.
> 
> gcc/testsuite/ChangeLog:
> 
>     * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz
> instruction available.
>     * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
>     * gcc.target/aarch64/frintnz.c: New test.
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)


[AArch64] Enable generation of FRINTNZ instructions

2021-11-11 Thread Andre Vieira (lists) via Gcc-patches

Hi,

This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding 
optabs and mappings. It also creates a backend pattern to implement them 
for aarch64 and a match.pd pattern to idiom recognize these.
These IFN's (and optabs) represent a truncation towards zero, as if 
performed by first casting it to a signed integer of 32 or 64 bits and 
then back to the same floating point type/mode.


The match.pd pattern choses to use these, when supported, regardless of 
trapping math, since these new patterns mimic the original behavior of 
truncating through an integer.


I didn't think any of the existing IFN's represented these. I know it's 
a bit late in stage 1, but I thought this might be OK given it's only 
used by a single target and should have very little impact on anything else.


Bootstrapped on aarch64-none-linux.

OK for trunk?

gcc/ChangeLog:

    * config/aarch64/aarch64.md (ftrunc2): New 
pattern.

    * config/aarch64/iterators.md (FRINTZ): New iterator.
    * doc/md.texi: New entry for ftrunc pattern name.
    * internal-fn.def (FTRUNC32): New IFN.
    (FTRUNC64): Likewise.
    * match.pd: Add to the existing TRUNC pattern match.
    * optabs.def (OPTAB_D): New entries for ftrunc.

gcc/testsuite/ChangeLog:

    * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz 
instruction available.

    * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
    * gcc.target/aarch64/frintnz.c: New test.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
4035e061706793849c68ae09bcb2e4b9580ab7b6..ad4e04d7c874da095513442e7d7f247791d8921d
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7345,6 +7345,16 @@ (define_insn "despeculate_simpleti"
(set_attr "speculation_barrier" "true")]
 )
 
+(define_insn "ftrunc2"
+  [(set (match_operand:VSFDF 0 "register_operand" "=w")
+(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
+ FRINTNZ))]
+  "TARGET_FRINT && TARGET_FLOAT
+   && !(VECTOR_MODE_P (mode) && !TARGET_SIMD)"
+  "\\t%0, %1"
+  [(set_attr "type" "f_rint")]
+)
+
 (define_insn "aarch64_"
   [(set (match_operand:VSFDF 0 "register_operand" "=w")
(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 
bdc8ba3576cf2c9b4ae96b45a382234e4e25b13f..49510488a2a800689e95c399f2e6c967b566516d
 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -3067,6 +3067,8 @@ (define_int_iterator FCMLA [UNSPEC_FCMLA
 (define_int_iterator FRINTNZX [UNSPEC_FRINT32Z UNSPEC_FRINT32X
   UNSPEC_FRINT64Z UNSPEC_FRINT64X])
 
+(define_int_iterator FRINTNZ [UNSPEC_FRINT32Z UNSPEC_FRINT64Z])
+
 (define_int_iterator SVE_BRK_UNARY [UNSPEC_BRKA UNSPEC_BRKB])
 
 (define_int_iterator SVE_BRK_BINARY [UNSPEC_BRKN UNSPEC_BRKPA UNSPEC_BRKPB])
@@ -3482,6 +3484,8 @@ (define_int_attr f16mac1 [(UNSPEC_FMLAL "a") 
(UNSPEC_FMLSL "s")
 (define_int_attr frintnzs_op [(UNSPEC_FRINT32Z "frint32z") (UNSPEC_FRINT32X 
"frint32x")
  (UNSPEC_FRINT64Z "frint64z") (UNSPEC_FRINT64X 
"frint64x")])
 
+(define_int_attr frintnz_mode [(UNSPEC_FRINT32Z "si") (UNSPEC_FRINT64Z "di")])
+
 ;; The condition associated with an UNSPEC_COND_.
 (define_int_attr cmp_op [(UNSPEC_COND_CMPEQ_WIDE "eq")
 (UNSPEC_COND_CMPGE_WIDE "ge")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 
41f1850bf6e95005647ca97a495a97d7e184d137..7bd66818144e87e1dca2ef13bef1d6f21f239570
 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6175,6 +6175,13 @@ operands; otherwise, it may not.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern
+@item @samp{ftrunc@var{m}@var{n}2}
+Truncate operand 1 to a @var{n} mode signed integer, towards zero, and store
+the result in operand 0. Both operands have mode @var{m}, which is a scalar or
+vector floating-point mode.
+
+
 @cindex @code{round@var{m}2} instruction pattern
 @item @samp{round@var{m}2}
 Round operand 1 to the nearest integer, rounding away from zero in the
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 
bb13c6cce1bf55633760bc14980402f1f0ac1689..64263cbb83548b140f613cb4bf5ce6565373f96d
 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -269,6 +269,8 @@ DEF_INTERNAL_FLT_FLOATN_FN (RINT, ECF_CONST, rint, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (ROUND, ECF_CONST, round, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (ROUNDEVEN, ECF_CONST, roundeven, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (TRUNC, ECF_CONST, btrunc, unary)
+DEF_INTERNAL_OPTAB_FN (FTRUNC32, ECF_CONST, ftrunc32, unary)
+DEF_INTERNAL_OPTAB_FN (FTRUNC64, ECF_CONST, ftrunc64, unary)
 
 /* Binary math functions.  */
 DEF_INTERNAL_FLT_FN (ATAN2, ECF_CONST, atan2, binary)
diff --git a/gcc/match.pd b/gcc/match.pd
index 
a319aefa8081ac177981ad425c461f8a771128f4..