Re: [AArch64] Enable generation of FRINTNZ instructions
On Tue, 15 Nov 2022, Richard Sandiford wrote: > "Andre Vieira (lists)" writes: > > On 07/11/2022 11:05, Richard Biener wrote: > >> On Fri, 4 Nov 2022, Andre Vieira (lists) wrote: > >> > >>> Sorry for the delay, just been reminded I still had this patch outstanding > >>> from last stage 1. Hopefully since it has been mostly reviewed it could > >>> go in > >>> for this stage 1? > >>> > >>> I addressed the comments and gave the slp-part of vectorizable_call some > >>> TLC > >>> to make it work. > >>> > >>> I also changed vect_get_slp_defs as I noticed that the call from > >>> vectorizable_call was creating an auto_vec with 'nargs' that might be less > >>> than the number of children in the slp_node > >> how so? Please fix that in the caller. It looks like it probably > >> shoud use vect_nargs instead? > > Well that was my first intuition, but when I looked at it further the > > variant it's calling: > > void vect_get_slp_defs (vec_info *, slp_tree slp_node, vec > > > *vec_oprnds, unsigned n) > > > > Is actually creating a vector of vectors of slp defs. So for each child > > of slp_node it calls: > > void vect_get_slp_defs (slp_tree slp_node, vec *vec_defs) > > > > Which returns a vector of vectorized defs. So vect_nargs would be the > > right size for the inner vec of vec_defs, but the outer should > > have the same number of elements as the original slp_node has children. > > > > However, at the call site (vectorizable_call), the operand we pass to > > vect_get_slp_defs 'vec_defs', is initialized before the code-path is > > specialized for slp_node. I'll go see if I can change the call site to > > not have to do that, given the continue at the end of the if (slp_node) > > BB I don't think it needs to use vec_defs after it, but it may require > > some massaging to be able to define it separately for each code-path. > > > >> > >>> , so that quick_push might not be > >>> safe as is, so I added the reserve (n) to ensure it's safe to push. I > >>> didn't > >>> actually come across any failure because of it though. Happy to split this > >>> into a separate patch if needed. > >>> > >>> Bootstrapped and regression tested on aarch64-none-linux-gnu and > >>> x86_64-pc-linux-gnu. > >>> > >>> OK for trunk? > >> I'll leave final approval to Richard but > >> > >> - This only needs 1 bit, but occupies the full 16 to ensure a nice > >> + This only needs 1 bit, but occupies the full 15 to ensure a nice > >>layout. */ > >> unsigned int vectorizable : 16; > >> > >> you don't actually change the width of the bitfield. I would find > >> it more natural to have > >> > >>signed int type0 : 7; > >>signed int type0_vtrans : 1; > >>signed int type1 : 7; > >>signed int type1_vtrans : 1; > >> > >> with typeN_vtrans specifying how the types transform when vectorized. > >> I would imagine another variant we could need is narrow/widen > >> according to either result or other argument type? That said, > >> just your flag would then be > >> > >>signed int type0 : 7; > >>signed int pad : 1; > >>signed int type1 : 7; > >>signed int type1_vect_as_scalar : 1; > >> > >> ? > > That's a cool idea! I'll leave it as a single bit for now like that, if > > we want to re-use it for multiple transformations we will obviously need > > to rename & give it more bits. > > I think we should steal bits from vectorizable rather than shrink > type0 and type1 though. Then add a 14-bit padding field to show > how many bits are left. > > > @@ -3340,9 +3364,20 @@ vectorizable_call (vec_info *vinfo, > >rhs_type = unsigned_type_node; > > } > > > > + /* The argument that is not of the same type as the others. */ > >int mask_opno = -1; > > + int scalar_opno = -1; > >if (internal_fn_p (cfn)) > > -mask_opno = internal_fn_mask_index (as_internal_fn (cfn)); > > +{ > > + internal_fn ifn = as_internal_fn (cfn); > > + if (direct_internal_fn_p (ifn) > > + && direct_internal_fn (ifn).type1_is_scalar_p) > > + scalar_opno = direct_internal_fn (ifn).type1; > > + else > > + /* For masked operations this represents the argument that carries the > > + mask. */ > > + mask_opno = internal_fn_mask_index (as_internal_fn (cfn)); > > This doesn't seem logically like an else. We should do both. > > LGTM otherwise for the bits outside match.pd. If Richard's happy with > the match.pd bits then I think the patch is OK with those changes and > without the vect_get_slp_defs thing (as you mentioned downthread). Yes, the match.pd part looked OK. > Thanks, > Richard > > > >> > >>> gcc/ChangeLog: > >>> > >>> * config/aarch64/aarch64.md (ftrunc2): New > >>> pattern. > >>> * config/aarch64/iterators.md (FRINTNZ): New iterator. > >>> (frintnz_mode): New int attribute. > >>> (VSFDF): Make iterator conditional. > >>> * internal-fn.def (FTRUNC_INT): New IFN. > >>> * internal-fn.cc (ftrunc_int_direct): Ne
Re: [AArch64] Enable generation of FRINTNZ instructions
"Andre Vieira (lists)" writes: > On 07/11/2022 11:05, Richard Biener wrote: >> On Fri, 4 Nov 2022, Andre Vieira (lists) wrote: >> >>> Sorry for the delay, just been reminded I still had this patch outstanding >>> from last stage 1. Hopefully since it has been mostly reviewed it could go >>> in >>> for this stage 1? >>> >>> I addressed the comments and gave the slp-part of vectorizable_call some TLC >>> to make it work. >>> >>> I also changed vect_get_slp_defs as I noticed that the call from >>> vectorizable_call was creating an auto_vec with 'nargs' that might be less >>> than the number of children in the slp_node >> how so? Please fix that in the caller. It looks like it probably >> shoud use vect_nargs instead? > Well that was my first intuition, but when I looked at it further the > variant it's calling: > void vect_get_slp_defs (vec_info *, slp_tree slp_node, vec > > *vec_oprnds, unsigned n) > > Is actually creating a vector of vectors of slp defs. So for each child > of slp_node it calls: > void vect_get_slp_defs (slp_tree slp_node, vec *vec_defs) > > Which returns a vector of vectorized defs. So vect_nargs would be the > right size for the inner vec of vec_defs, but the outer should > have the same number of elements as the original slp_node has children. > > However, at the call site (vectorizable_call), the operand we pass to > vect_get_slp_defs 'vec_defs', is initialized before the code-path is > specialized for slp_node. I'll go see if I can change the call site to > not have to do that, given the continue at the end of the if (slp_node) > BB I don't think it needs to use vec_defs after it, but it may require > some massaging to be able to define it separately for each code-path. > >> >>> , so that quick_push might not be >>> safe as is, so I added the reserve (n) to ensure it's safe to push. I didn't >>> actually come across any failure because of it though. Happy to split this >>> into a separate patch if needed. >>> >>> Bootstrapped and regression tested on aarch64-none-linux-gnu and >>> x86_64-pc-linux-gnu. >>> >>> OK for trunk? >> I'll leave final approval to Richard but >> >> - This only needs 1 bit, but occupies the full 16 to ensure a nice >> + This only needs 1 bit, but occupies the full 15 to ensure a nice >>layout. */ >> unsigned int vectorizable : 16; >> >> you don't actually change the width of the bitfield. I would find >> it more natural to have >> >>signed int type0 : 7; >>signed int type0_vtrans : 1; >>signed int type1 : 7; >>signed int type1_vtrans : 1; >> >> with typeN_vtrans specifying how the types transform when vectorized. >> I would imagine another variant we could need is narrow/widen >> according to either result or other argument type? That said, >> just your flag would then be >> >>signed int type0 : 7; >>signed int pad : 1; >>signed int type1 : 7; >>signed int type1_vect_as_scalar : 1; >> >> ? > That's a cool idea! I'll leave it as a single bit for now like that, if > we want to re-use it for multiple transformations we will obviously need > to rename & give it more bits. I think we should steal bits from vectorizable rather than shrink type0 and type1 though. Then add a 14-bit padding field to show how many bits are left. > @@ -3340,9 +3364,20 @@ vectorizable_call (vec_info *vinfo, >rhs_type = unsigned_type_node; > } > > + /* The argument that is not of the same type as the others. */ >int mask_opno = -1; > + int scalar_opno = -1; >if (internal_fn_p (cfn)) > -mask_opno = internal_fn_mask_index (as_internal_fn (cfn)); > +{ > + internal_fn ifn = as_internal_fn (cfn); > + if (direct_internal_fn_p (ifn) > + && direct_internal_fn (ifn).type1_is_scalar_p) > + scalar_opno = direct_internal_fn (ifn).type1; > + else > + /* For masked operations this represents the argument that carries the > +mask. */ > + mask_opno = internal_fn_mask_index (as_internal_fn (cfn)); This doesn't seem logically like an else. We should do both. LGTM otherwise for the bits outside match.pd. If Richard's happy with the match.pd bits then I think the patch is OK with those changes and without the vect_get_slp_defs thing (as you mentioned downthread). Thanks, Richard >> >>> gcc/ChangeLog: >>> >>> * config/aarch64/aarch64.md (ftrunc2): New >>> pattern. >>> * config/aarch64/iterators.md (FRINTNZ): New iterator. >>> (frintnz_mode): New int attribute. >>> (VSFDF): Make iterator conditional. >>> * internal-fn.def (FTRUNC_INT): New IFN. >>> * internal-fn.cc (ftrunc_int_direct): New define. >>> (expand_ftrunc_int_optab_fn): New custom expander. >>> (direct_ftrunc_int_optab_supported_p): New supported_p. >>> * internal-fn.h (direct_internal_fn_info): Add new member >>> type1_is_scalar_p. >>> * match.pd: Add to the existing TRUNC pattern match. >>>
Re: [AArch64] Enable generation of FRINTNZ instructions
On 07/11/2022 14:56, Richard Biener wrote: On Mon, 7 Nov 2022, Andre Vieira (lists) wrote: On 07/11/2022 11:05, Richard Biener wrote: On Fri, 4 Nov 2022, Andre Vieira (lists) wrote: Sorry for the delay, just been reminded I still had this patch outstanding from last stage 1. Hopefully since it has been mostly reviewed it could go in for this stage 1? I addressed the comments and gave the slp-part of vectorizable_call some TLC to make it work. I also changed vect_get_slp_defs as I noticed that the call from vectorizable_call was creating an auto_vec with 'nargs' that might be less than the number of children in the slp_node how so? Please fix that in the caller. It looks like it probably shoud use vect_nargs instead? Well that was my first intuition, but when I looked at it further the variant it's calling: void vect_get_slp_defs (vec_info *, slp_tree slp_node, vec > *vec_oprnds, unsigned n) Is actually creating a vector of vectors of slp defs. So for each child of slp_node it calls: void vect_get_slp_defs (slp_tree slp_node, vec *vec_defs) Which returns a vector of vectorized defs. So vect_nargs would be the right size for the inner vec of vec_defs, but the outer should have the same number of elements as the original slp_node has children. No, the inner vector is the vector of vectors for each arg, the outer vector should be the one for each argument. Hm, that was a confusing sentence. That said, the number of SLP children of a call node should eventually be the number of arguments of the call (plus masks, etc.). So it looks about correct besides the vec_nargs issue? Yeah you are right, I misunderstood what the children were, so you have a child node for each argument of the call. Though, since you iterate over the 'scalar' arguments of the call I actually think 'nargs' was correct to begin with, which would explain why this never went wrong... So I think it is actually correct as is, I must have gotten confused by some earlier investigation into how to deal with the scalar arguments... Sorry for the noise, I'll undo these changes to the patch.
Re: [AArch64] Enable generation of FRINTNZ instructions
On Mon, 7 Nov 2022, Andre Vieira (lists) wrote: > > On 07/11/2022 11:05, Richard Biener wrote: > > On Fri, 4 Nov 2022, Andre Vieira (lists) wrote: > > > >> Sorry for the delay, just been reminded I still had this patch outstanding > >> from last stage 1. Hopefully since it has been mostly reviewed it could go > >> in > >> for this stage 1? > >> > >> I addressed the comments and gave the slp-part of vectorizable_call some > >> TLC > >> to make it work. > >> > >> I also changed vect_get_slp_defs as I noticed that the call from > >> vectorizable_call was creating an auto_vec with 'nargs' that might be less > >> than the number of children in the slp_node > > how so? Please fix that in the caller. It looks like it probably > > shoud use vect_nargs instead? > Well that was my first intuition, but when I looked at it further the variant > it's calling: > void vect_get_slp_defs (vec_info *, slp_tree slp_node, vec > > *vec_oprnds, unsigned n) > > Is actually creating a vector of vectors of slp defs. So for each child of > slp_node it calls: > void vect_get_slp_defs (slp_tree slp_node, vec *vec_defs) > > Which returns a vector of vectorized defs. So vect_nargs would be the right > size for the inner vec of vec_defs, but the outer should have the same > number of elements as the original slp_node has children. No, the inner vector is the vector of vectors for each arg, the outer vector should be the one for each argument. Hm, that was a confusing sentence. That said, the number of SLP children of a call node should eventually be the number of arguments of the call (plus masks, etc.). So it looks about correct besides the vec_nargs issue? > > However, at the call site (vectorizable_call), the operand we pass to > vect_get_slp_defs 'vec_defs', is initialized before the code-path is > specialized for slp_node. I'll go see if I can change the call site to not > have to do that, given the continue at the end of the if (slp_node) BB I don't > think it needs to use vec_defs after it, but it may require some massaging to > be able to define it separately for each code-path. > > > > >> , so that quick_push might not be > >> safe as is, so I added the reserve (n) to ensure it's safe to push. I > >> didn't > >> actually come across any failure because of it though. Happy to split this > >> into a separate patch if needed. > >> > >> Bootstrapped and regression tested on aarch64-none-linux-gnu and > >> x86_64-pc-linux-gnu. > >> > >> OK for trunk? > > I'll leave final approval to Richard but > > > > - This only needs 1 bit, but occupies the full 16 to ensure a nice > > + This only needs 1 bit, but occupies the full 15 to ensure a nice > >layout. */ > > unsigned int vectorizable : 16; > > > > you don't actually change the width of the bitfield. I would find > > it more natural to have > > > >signed int type0 : 7; > >signed int type0_vtrans : 1; > >signed int type1 : 7; > >signed int type1_vtrans : 1; > > > > with typeN_vtrans specifying how the types transform when vectorized. > > I would imagine another variant we could need is narrow/widen > > according to either result or other argument type? That said, > > just your flag would then be > > > >signed int type0 : 7; > >signed int pad : 1; > >signed int type1 : 7; > >signed int type1_vect_as_scalar : 1; > > > > ? > That's a cool idea! I'll leave it as a single bit for now like that, if we > want to re-use it for multiple transformations we will obviously need to > rename & give it more bits. > > > >> gcc/ChangeLog: > >> > >> * config/aarch64/aarch64.md (ftrunc2): New > >> pattern. > >> * config/aarch64/iterators.md (FRINTNZ): New iterator. > >> (frintnz_mode): New int attribute. > >> (VSFDF): Make iterator conditional. > >> * internal-fn.def (FTRUNC_INT): New IFN. > >> * internal-fn.cc (ftrunc_int_direct): New define. > >> (expand_ftrunc_int_optab_fn): New custom expander. > >> (direct_ftrunc_int_optab_supported_p): New supported_p. > >> * internal-fn.h (direct_internal_fn_info): Add new member > >> type1_is_scalar_p. > >> * match.pd: Add to the existing TRUNC pattern match. > >> * optabs.def (ftrunc_int): New entry. > >> * stor-layout.h (element_precision): Moved from here... > >> * tree.h (element_precision): ... to here. > >> (element_type): New declaration. > >> * tree.cc (element_type): New function. > >> (element_precision): Changed to use element_type. > >> * tree-vect-stmts.cc (vectorizable_internal_function): Add > >> support for > >> IFNs with different input types. > >> (vect_get_scalar_oprnds): New function. > >> (vectorizable_call): Teach to handle IFN_FTRUNC_INT. > >> * tree-vect-slp.cc (check_scalar_arg_ok): New function. > >> (vect_slp_analyze_node_operations): Use check_scalar_arg_ok. > >>
Re: [AArch64] Enable generation of FRINTNZ instructions
On 07/11/2022 11:05, Richard Biener wrote: On Fri, 4 Nov 2022, Andre Vieira (lists) wrote: Sorry for the delay, just been reminded I still had this patch outstanding from last stage 1. Hopefully since it has been mostly reviewed it could go in for this stage 1? I addressed the comments and gave the slp-part of vectorizable_call some TLC to make it work. I also changed vect_get_slp_defs as I noticed that the call from vectorizable_call was creating an auto_vec with 'nargs' that might be less than the number of children in the slp_node how so? Please fix that in the caller. It looks like it probably shoud use vect_nargs instead? Well that was my first intuition, but when I looked at it further the variant it's calling: void vect_get_slp_defs (vec_info *, slp_tree slp_node, vec > *vec_oprnds, unsigned n) Is actually creating a vector of vectors of slp defs. So for each child of slp_node it calls: void vect_get_slp_defs (slp_tree slp_node, vec *vec_defs) Which returns a vector of vectorized defs. So vect_nargs would be the right size for the inner vec of vec_defs, but the outer should have the same number of elements as the original slp_node has children. However, at the call site (vectorizable_call), the operand we pass to vect_get_slp_defs 'vec_defs', is initialized before the code-path is specialized for slp_node. I'll go see if I can change the call site to not have to do that, given the continue at the end of the if (slp_node) BB I don't think it needs to use vec_defs after it, but it may require some massaging to be able to define it separately for each code-path. , so that quick_push might not be safe as is, so I added the reserve (n) to ensure it's safe to push. I didn't actually come across any failure because of it though. Happy to split this into a separate patch if needed. Bootstrapped and regression tested on aarch64-none-linux-gnu and x86_64-pc-linux-gnu. OK for trunk? I'll leave final approval to Richard but - This only needs 1 bit, but occupies the full 16 to ensure a nice + This only needs 1 bit, but occupies the full 15 to ensure a nice layout. */ unsigned int vectorizable : 16; you don't actually change the width of the bitfield. I would find it more natural to have signed int type0 : 7; signed int type0_vtrans : 1; signed int type1 : 7; signed int type1_vtrans : 1; with typeN_vtrans specifying how the types transform when vectorized. I would imagine another variant we could need is narrow/widen according to either result or other argument type? That said, just your flag would then be signed int type0 : 7; signed int pad : 1; signed int type1 : 7; signed int type1_vect_as_scalar : 1; ? That's a cool idea! I'll leave it as a single bit for now like that, if we want to re-use it for multiple transformations we will obviously need to rename & give it more bits. gcc/ChangeLog: * config/aarch64/aarch64.md (ftrunc2): New pattern. * config/aarch64/iterators.md (FRINTNZ): New iterator. (frintnz_mode): New int attribute. (VSFDF): Make iterator conditional. * internal-fn.def (FTRUNC_INT): New IFN. * internal-fn.cc (ftrunc_int_direct): New define. (expand_ftrunc_int_optab_fn): New custom expander. (direct_ftrunc_int_optab_supported_p): New supported_p. * internal-fn.h (direct_internal_fn_info): Add new member type1_is_scalar_p. * match.pd: Add to the existing TRUNC pattern match. * optabs.def (ftrunc_int): New entry. * stor-layout.h (element_precision): Moved from here... * tree.h (element_precision): ... to here. (element_type): New declaration. * tree.cc (element_type): New function. (element_precision): Changed to use element_type. * tree-vect-stmts.cc (vectorizable_internal_function): Add support for IFNs with different input types. (vect_get_scalar_oprnds): New function. (vectorizable_call): Teach to handle IFN_FTRUNC_INT. * tree-vect-slp.cc (check_scalar_arg_ok): New function. (vect_slp_analyze_node_operations): Use check_scalar_arg_ok. (vect_get_slp_defs): Ensure vec_oprnds has enough slots to push. * doc/md.texi: New entry for ftrunc pattern name. * doc/sourcebuild.texi (aarch64_frintzx_ok): New target. gcc/testsuite/ChangeLog: * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintnz instructions available. * lib/target-supports.exp: Added aarch64_frintnzx_ok target and aarch64_frintz options. * gcc.target/aarch64/frintnz.c: New test. * gcc.target/aarch64/frintnz_vec.c: New test. * gcc.target/aarch64/frintnz_slp.c: New test.
Re: [AArch64] Enable generation of FRINTNZ instructions
On Fri, 4 Nov 2022, Andre Vieira (lists) wrote: > Sorry for the delay, just been reminded I still had this patch outstanding > from last stage 1. Hopefully since it has been mostly reviewed it could go in > for this stage 1? > > I addressed the comments and gave the slp-part of vectorizable_call some TLC > to make it work. > > I also changed vect_get_slp_defs as I noticed that the call from > vectorizable_call was creating an auto_vec with 'nargs' that might be less > than the number of children in the slp_node how so? Please fix that in the caller. It looks like it probably shoud use vect_nargs instead? > , so that quick_push might not be > safe as is, so I added the reserve (n) to ensure it's safe to push. I didn't > actually come across any failure because of it though. Happy to split this > into a separate patch if needed. > > Bootstrapped and regression tested on aarch64-none-linux-gnu and > x86_64-pc-linux-gnu. > > OK for trunk? I'll leave final approval to Richard but - This only needs 1 bit, but occupies the full 16 to ensure a nice + This only needs 1 bit, but occupies the full 15 to ensure a nice layout. */ unsigned int vectorizable : 16; you don't actually change the width of the bitfield. I would find it more natural to have signed int type0 : 7; signed int type0_vtrans : 1; signed int type1 : 7; signed int type1_vtrans : 1; with typeN_vtrans specifying how the types transform when vectorized. I would imagine another variant we could need is narrow/widen according to either result or other argument type? That said, just your flag would then be signed int type0 : 7; signed int pad : 1; signed int type1 : 7; signed int type1_vect_as_scalar : 1; ? > gcc/ChangeLog: > > * config/aarch64/aarch64.md (ftrunc2): New > pattern. > * config/aarch64/iterators.md (FRINTNZ): New iterator. > (frintnz_mode): New int attribute. > (VSFDF): Make iterator conditional. > * internal-fn.def (FTRUNC_INT): New IFN. > * internal-fn.cc (ftrunc_int_direct): New define. > (expand_ftrunc_int_optab_fn): New custom expander. > (direct_ftrunc_int_optab_supported_p): New supported_p. > * internal-fn.h (direct_internal_fn_info): Add new member > type1_is_scalar_p. > * match.pd: Add to the existing TRUNC pattern match. > * optabs.def (ftrunc_int): New entry. > * stor-layout.h (element_precision): Moved from here... > * tree.h (element_precision): ... to here. > (element_type): New declaration. > * tree.cc (element_type): New function. > (element_precision): Changed to use element_type. > * tree-vect-stmts.cc (vectorizable_internal_function): Add > support for > IFNs with different input types. > (vect_get_scalar_oprnds): New function. > (vectorizable_call): Teach to handle IFN_FTRUNC_INT. > * tree-vect-slp.cc (check_scalar_arg_ok): New function. > (vect_slp_analyze_node_operations): Use check_scalar_arg_ok. > (vect_get_slp_defs): Ensure vec_oprnds has enough slots to push. > * doc/md.texi: New entry for ftrunc pattern name. > * doc/sourcebuild.texi (aarch64_frintzx_ok): New target. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintnz > instructions available. > * lib/target-supports.exp: Added aarch64_frintnzx_ok target and > aarch64_frintz options. > * gcc.target/aarch64/frintnz.c: New test. > * gcc.target/aarch64/frintnz_vec.c: New test. > * gcc.target/aarch64/frintnz_slp.c: New test. > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)
Re: [AArch64] Enable generation of FRINTNZ instructions
Sorry for the delay, just been reminded I still had this patch outstanding from last stage 1. Hopefully since it has been mostly reviewed it could go in for this stage 1? I addressed the comments and gave the slp-part of vectorizable_call some TLC to make it work. I also changed vect_get_slp_defs as I noticed that the call from vectorizable_call was creating an auto_vec with 'nargs' that might be less than the number of children in the slp_node, so that quick_push might not be safe as is, so I added the reserve (n) to ensure it's safe to push. I didn't actually come across any failure because of it though. Happy to split this into a separate patch if needed. Bootstrapped and regression tested on aarch64-none-linux-gnu and x86_64-pc-linux-gnu. OK for trunk? gcc/ChangeLog: * config/aarch64/aarch64.md (ftrunc2): New pattern. * config/aarch64/iterators.md (FRINTNZ): New iterator. (frintnz_mode): New int attribute. (VSFDF): Make iterator conditional. * internal-fn.def (FTRUNC_INT): New IFN. * internal-fn.cc (ftrunc_int_direct): New define. (expand_ftrunc_int_optab_fn): New custom expander. (direct_ftrunc_int_optab_supported_p): New supported_p. * internal-fn.h (direct_internal_fn_info): Add new member type1_is_scalar_p. * match.pd: Add to the existing TRUNC pattern match. * optabs.def (ftrunc_int): New entry. * stor-layout.h (element_precision): Moved from here... * tree.h (element_precision): ... to here. (element_type): New declaration. * tree.cc (element_type): New function. (element_precision): Changed to use element_type. * tree-vect-stmts.cc (vectorizable_internal_function): Add support for IFNs with different input types. (vect_get_scalar_oprnds): New function. (vectorizable_call): Teach to handle IFN_FTRUNC_INT. * tree-vect-slp.cc (check_scalar_arg_ok): New function. (vect_slp_analyze_node_operations): Use check_scalar_arg_ok. (vect_get_slp_defs): Ensure vec_oprnds has enough slots to push. * doc/md.texi: New entry for ftrunc pattern name. * doc/sourcebuild.texi (aarch64_frintzx_ok): New target. gcc/testsuite/ChangeLog: * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintnz instructions available. * lib/target-supports.exp: Added aarch64_frintnzx_ok target and aarch64_frintz options. * gcc.target/aarch64/frintnz.c: New test. * gcc.target/aarch64/frintnz_vec.c: New test. * gcc.target/aarch64/frintnz_slp.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index f2e3d905dbbeb2949f2947f5cfd68208c94c9272..47368e09b106e5b43640bd4f113abd0b9a15b9c8 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -7564,12 +7564,18 @@ (set_attr "speculation_barrier" "true")] ) +(define_expand "ftrunc2" + [(set (match_operand:VSFDF 0 "register_operand" "=w") +(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")] + FRINTNZ))] + "TARGET_FRINT" +) + (define_insn "aarch64_" [(set (match_operand:VSFDF 0 "register_operand" "=w") (unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")] FRINTNZX))] - "TARGET_FRINT && TARGET_FLOAT - && !(VECTOR_MODE_P (mode) && !TARGET_SIMD)" + "TARGET_FRINT" "\\t%0, %1" [(set_attr "type" "f_rint")] ) diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index a8ad4e5ff215ade06c3ca13a24ef18d259afcb6c..b1f78d87fbe6118e792b00580c6beb23ce63e27c 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -173,7 +173,11 @@ SF DF]) ;; Scalar and vetor modes for SF, DF. -(define_mode_iterator VSFDF [V2SF V4SF V2DF DF SF]) +(define_mode_iterator VSFDF [(V2SF "TARGET_SIMD") +(V4SF "TARGET_SIMD") +(V2DF "TARGET_SIMD") +(DF "TARGET_FLOAT") +(SF "TARGET_FLOAT")]) ;; Advanced SIMD single Float modes. (define_mode_iterator VDQSF [V2SF V4SF]) @@ -3136,6 +3140,8 @@ (define_int_iterator FRINTNZX [UNSPEC_FRINT32Z UNSPEC_FRINT32X UNSPEC_FRINT64Z UNSPEC_FRINT64X]) +(define_int_iterator FRINTNZ [UNSPEC_FRINT32Z UNSPEC_FRINT64Z]) + (define_int_iterator SVE_BRK_UNARY [UNSPEC_BRKA UNSPEC_BRKB]) (define_int_iterator SVE_BRKP [UNSPEC_BRKPA UNSPEC_BRKPB]) @@ -3545,6 +3551,8 @@ (define_int_attr frintnzs_op [(UNSPEC_FRINT32Z "frint32z") (UNSPEC_FRINT32X "frint32x") (UNSPEC_FRINT64Z "frint64z") (UNSPEC_FRINT64X "frint64x")]) +(define_int_attr frintnz_mode [(UNSPEC_FRINT32Z "si") (UNSPEC_FRINT64Z "di")]) + ;; The condition associated with an UNSPEC_COND_. (define_int_attr cmp_op [(UNSPEC_COND_CMPEQ_WIDE "eq")
Re: [AArch64] Enable generation of FRINTNZ instructions
"Andre Vieira (lists)" writes: > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index > 19e89ae502bc2f51db64667b236c1cb669718b02..3b0e4e0875b4392ab6833568b207580ef597a98f > 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -6191,6 +6191,15 @@ operands; otherwise, it may not. > > This pattern is not allowed to @code{FAIL}. > > +@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern > +@item @samp{ftrunc@var{m}@var{n}2} > +Truncate operand 1 to a @var{n} mode signed integer, towards zero, and store > +the result in operand 0. Both operands have mode @var{m}, which is a scalar > or > +vector floating-point mode. An exception must be raised if operand 1 does > not > +fit in a @var{n} mode signed integer as it would have if the truncation > +happened through separate floating point to integer conversion. > + > + Nit: just one blank line here. > @cindex @code{round@var{m}2} instruction pattern > @item @samp{round@var{m}2} > Round operand 1 to the nearest integer, rounding away from zero in the > diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi > index > 6095a35cd4565fdb7d758104e80fe6411230f758..a56bbb775572fa72379854f90a01ad543557e29a > 100644 > --- a/gcc/doc/sourcebuild.texi > +++ b/gcc/doc/sourcebuild.texi > @@ -2286,6 +2286,10 @@ Like @code{aarch64_sve_hw}, but also test for an exact > hardware vector length. > @item aarch64_fjcvtzs_hw > AArch64 target that is able to generate and execute armv8.3-a FJCVTZS > instruction. > + > +@item aarch64_frintzx_ok > +AArch64 target that is able to generate the Armv8.5-a FRINT32Z, FRINT64Z, > +FRINT32X and FRINT64X instructions. > @end table > > @subsubsection MIPS-specific attributes > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c > index > b24102a5990bea4cbb102069f7a6df497fc81ebf..9047b486f41948059a7a7f1ccc4032410a369139 > 100644 > --- a/gcc/internal-fn.c > +++ b/gcc/internal-fn.c > @@ -130,6 +130,7 @@ init_internal_fns () > #define fold_left_direct { 1, 1, false } > #define mask_fold_left_direct { 1, 1, false } > #define check_ptrs_direct { 0, 0, false } > +#define ftrunc_int_direct { 0, 1, true } > > const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = { > #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) not_direct, > @@ -156,6 +157,29 @@ get_multi_vector_move (tree array_type, convert_optab > optab) >return convert_optab_handler (optab, imode, vmode); > } > > +/* Expand FTRUNC_INT call STMT using optab OPTAB. */ > + > +static void > +expand_ftrunc_int_optab_fn (internal_fn, gcall *stmt, convert_optab optab) > +{ > + class expand_operand ops[2]; > + tree lhs, float_type, int_type; > + rtx target, op; > + > + lhs = gimple_call_lhs (stmt); > + target = expand_normal (lhs); > + op = expand_normal (gimple_call_arg (stmt, 0)); > + > + float_type = TREE_TYPE (lhs); > + int_type = element_type (gimple_call_arg (stmt, 1)); Sorry for the run-around, but now that we don't (need to) vectorise the second argument, I think we can drop this element_type. That in turn means that… > + > + create_output_operand (&ops[0], target, TYPE_MODE (float_type)); > + create_input_operand (&ops[1], op, TYPE_MODE (float_type)); > + > + expand_insn (convert_optab_handler (optab, TYPE_MODE (float_type), > + TYPE_MODE (int_type)), 2, ops); > +} > + > /* Expand LOAD_LANES call STMT using optab OPTAB. */ > > static void > @@ -3747,6 +3771,15 @@ multi_vector_optab_supported_p (convert_optab optab, > tree_pair types, > != CODE_FOR_nothing); > } > > +static bool > +direct_ftrunc_int_optab_supported_p (convert_optab optab, tree_pair types, > + optimization_type opt_type) > +{ > + return (convert_optab_handler (optab, TYPE_MODE (types.first), > + TYPE_MODE (element_type (types.second)), > + opt_type) != CODE_FOR_nothing); > +} > + …this can use convert_optab_supported_p. > diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz.c > b/gcc/testsuite/gcc.target/aarch64/frintnz.c > new file mode 100644 > index > ..008e1cf9f4a1b0148128c65c9ea0d1bb111467b7 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/frintnz.c > @@ -0,0 +1,91 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -march=armv8.5-a" } */ > +/* { dg-require-effective-target aarch64_frintnzx_ok } */ Is this just a cut-&-pasto from a run test? If not, why do we need both this and the dg-options? It feels like one on its own should be enough, with the dg-options being better. The test looks OK without this line. > +/* { dg-final { check-function-bodies "**" "" } } */ > + > +/* > +** f1: > +** frint32zs0, s0 > +** ret > +*/ > +float > +f1 (float x) > +{ > + int y = x; > + return (float) y; > +} > + > +/* > +** f2: > +** frint64zs0, s0 > +** ret > +*/ > +float > +f2 (float x) > +{ > + long long int y = x; > + return (float) y
Re: [AArch64] Enable generation of FRINTNZ instructions
On Mon, 10 Jan 2022, Andre Vieira (lists) wrote: > Yeah seems I forgot to send the latest version, my bad. > > Bootstrapped on aarch64-none-linux. > > OK for trunk? The match.pd part looks OK to me. Thanks, Richard. > gcc/ChangeLog: > > * config/aarch64/aarch64.md (ftrunc2): New > pattern. > * config/aarch64/iterators.md (FRINTNZ): New iterator. > (frintnz_mode): New int attribute. > (VSFDF): Make iterator conditional. > * internal-fn.def (FTRUNC_INT): New IFN. > * internal-fn.c (ftrunc_int_direct): New define. > (expand_ftrunc_int_optab_fn): New custom expander. > (direct_ftrunc_int_optab_supported_p): New supported_p. > * match.pd: Add to the existing TRUNC pattern match. > * optabs.def (ftrunc_int): New entry. > * stor-layout.h (element_precision): Moved from here... > * tree.h (element_precision): ... to here. > (element_type): New declaration. > * tree.c (element_type): New function. > (element_precision): Changed to use element_type. > * tree-vect-stmts.c (vectorizable_internal_function): Add > support for > IFNs with different input types. > (vectorizable_call): Teach to handle IFN_FTRUNC_INT. > * doc/md.texi: New entry for ftrunc pattern name. > * doc/sourcebuild.texi (aarch64_frintzx_ok): New target. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz > instruction available. > * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target. > * gcc.target/aarch64/frintnz.c: New test. > * gcc.target/aarch64/frintnz_vec.c: New test. > > On 03/01/2022 12:18, Richard Biener wrote: > > On Wed, 29 Dec 2021, Andre Vieira (lists) wrote: > > > >> Hi Richard, > >> > >> Thank you for the review, I've adopted all above suggestions downstream, I > >> am > >> still surprised how many style things I still miss after years of gcc > >> development :( > >> > >> On 17/12/2021 12:44, Richard Sandiford wrote: > @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo, > rhs_type = unsigned_type_node; > } > - int mask_opno = -1; > + /* The argument that is not of the same type as the others. */ > + int diff_opno = -1; > + bool masked = false; > if (internal_fn_p (cfn)) > -mask_opno = internal_fn_mask_index (as_internal_fn (cfn)); > +{ > + if (cfn == CFN_FTRUNC_INT) > +/* For FTRUNC this represents the argument that carries the > type of > the > + intermediate signed integer. */ > +diff_opno = 1; > + else > +{ > + /* For masked operations this represents the argument that > carries > the > + mask. */ > + diff_opno = internal_fn_mask_index (as_internal_fn (cfn)); > + masked = diff_opno >= 0; > +} > +} > >>> I think it would be cleaner not to process argument 1 at all for > >>> CFN_FTRUNC_INT. There's no particular need to vectorise it. > >> I agree with this, will change the loop to continue for argument 1 when > >> dealing with non-masked CFN's. > >> > } > […] > diff --git a/gcc/tree.c b/gcc/tree.c > index > 845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1 > 100644 > --- a/gcc/tree.c > +++ b/gcc/tree.c > @@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size, > cst_size_error *perr /* = NULL */) > return true; > } > > -/* Return the precision of the type, or for a complex or vector type the > - precision of the type of its elements. */ > +/* Return the type, or for a complex or vector type the type of its > + elements. */ > -unsigned int > -element_precision (const_tree type) > +tree > +element_type (const_tree type) > { > if (!TYPE_P (type)) > type = TREE_TYPE (type); > @@ -6657,7 +6657,16 @@ element_precision (const_tree type) > if (code == COMPLEX_TYPE || code == VECTOR_TYPE) > type = TREE_TYPE (type); > - return TYPE_PRECISION (type); > + return (tree) type; > >>> I think we should stick a const_cast in element_precision and make > >>> element_type take a plain “type”. As it stands element_type is an > >>> implicit const_cast for many cases. > >>> > >>> Thanks, > >> Was just curious about something here, I thought the purpose of having > >> element_precision (before) and element_type (now) take a const_tree as an > >> argument was to make it clear we aren't changing the input type. I > >> understand > >> that as it stands element_type could be an implicit const_cast (which I > >> should > >> be using rather than the '(tree)' cast), but that's only if 'type' is a
Re: [AArch64] Enable generation of FRINTNZ instructions
Yeah seems I forgot to send the latest version, my bad. Bootstrapped on aarch64-none-linux. OK for trunk? gcc/ChangeLog: * config/aarch64/aarch64.md (ftrunc2): New pattern. * config/aarch64/iterators.md (FRINTNZ): New iterator. (frintnz_mode): New int attribute. (VSFDF): Make iterator conditional. * internal-fn.def (FTRUNC_INT): New IFN. * internal-fn.c (ftrunc_int_direct): New define. (expand_ftrunc_int_optab_fn): New custom expander. (direct_ftrunc_int_optab_supported_p): New supported_p. * match.pd: Add to the existing TRUNC pattern match. * optabs.def (ftrunc_int): New entry. * stor-layout.h (element_precision): Moved from here... * tree.h (element_precision): ... to here. (element_type): New declaration. * tree.c (element_type): New function. (element_precision): Changed to use element_type. * tree-vect-stmts.c (vectorizable_internal_function): Add support for IFNs with different input types. (vectorizable_call): Teach to handle IFN_FTRUNC_INT. * doc/md.texi: New entry for ftrunc pattern name. * doc/sourcebuild.texi (aarch64_frintzx_ok): New target. gcc/testsuite/ChangeLog: * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz instruction available. * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target. * gcc.target/aarch64/frintnz.c: New test. * gcc.target/aarch64/frintnz_vec.c: New test. On 03/01/2022 12:18, Richard Biener wrote: On Wed, 29 Dec 2021, Andre Vieira (lists) wrote: Hi Richard, Thank you for the review, I've adopted all above suggestions downstream, I am still surprised how many style things I still miss after years of gcc development :( On 17/12/2021 12:44, Richard Sandiford wrote: @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo, rhs_type = unsigned_type_node; } - int mask_opno = -1; + /* The argument that is not of the same type as the others. */ + int diff_opno = -1; + bool masked = false; if (internal_fn_p (cfn)) -mask_opno = internal_fn_mask_index (as_internal_fn (cfn)); +{ + if (cfn == CFN_FTRUNC_INT) + /* For FTRUNC this represents the argument that carries the type of the + intermediate signed integer. */ + diff_opno = 1; + else + { + /* For masked operations this represents the argument that carries the +mask. */ + diff_opno = internal_fn_mask_index (as_internal_fn (cfn)); + masked = diff_opno >= 0; + } +} I think it would be cleaner not to process argument 1 at all for CFN_FTRUNC_INT. There's no particular need to vectorise it. I agree with this, will change the loop to continue for argument 1 when dealing with non-masked CFN's. } […] diff --git a/gcc/tree.c b/gcc/tree.c index 845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size, cst_size_error *perr /* = NULL */) return true; } -/* Return the precision of the type, or for a complex or vector type the - precision of the type of its elements. */ +/* Return the type, or for a complex or vector type the type of its + elements. */ -unsigned int -element_precision (const_tree type) +tree +element_type (const_tree type) { if (!TYPE_P (type)) type = TREE_TYPE (type); @@ -6657,7 +6657,16 @@ element_precision (const_tree type) if (code == COMPLEX_TYPE || code == VECTOR_TYPE) type = TREE_TYPE (type); - return TYPE_PRECISION (type); + return (tree) type; I think we should stick a const_cast in element_precision and make element_type take a plain “type”. As it stands element_type is an implicit const_cast for many cases. Thanks, Was just curious about something here, I thought the purpose of having element_precision (before) and element_type (now) take a const_tree as an argument was to make it clear we aren't changing the input type. I understand that as it stands element_type could be an implicit const_cast (which I should be using rather than the '(tree)' cast), but that's only if 'type' is a type that isn't complex/vector, either way, we are conforming to the promise that we aren't changing the incoming type, what the caller then does with the result is up to them no? I don't mind making the changes, just trying to understand the reasoning behind it. I'll send in a new patch with all the changes after the review on the match.pd stuff. I'm missing an updated patch after my initial review of the match.pd stuff so not sure what to review. Can you re-post and updated patch? Thanks, Richard.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 3c72bdad01bfab49ee4ae6fb7b139202e89c8d34..9d04a2e088cd7d03963b58ed3708c339b841801c 100644 --- a/gcc/config
Re: [AArch64] Enable generation of FRINTNZ instructions
On Wed, 29 Dec 2021, Andre Vieira (lists) wrote: > Hi Richard, > > Thank you for the review, I've adopted all above suggestions downstream, I am > still surprised how many style things I still miss after years of gcc > development :( > > On 17/12/2021 12:44, Richard Sandiford wrote: > > > >> @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo, > >> rhs_type = unsigned_type_node; > >> } > >> - int mask_opno = -1; > >> + /* The argument that is not of the same type as the others. */ > >> + int diff_opno = -1; > >> + bool masked = false; > >> if (internal_fn_p (cfn)) > >> -mask_opno = internal_fn_mask_index (as_internal_fn (cfn)); > >> +{ > >> + if (cfn == CFN_FTRUNC_INT) > >> + /* For FTRUNC this represents the argument that carries the type of > >> the > >> + intermediate signed integer. */ > >> + diff_opno = 1; > >> + else > >> + { > >> +/* For masked operations this represents the argument that carries > >> the > >> + mask. */ > >> +diff_opno = internal_fn_mask_index (as_internal_fn (cfn)); > >> +masked = diff_opno >= 0; > >> + } > >> +} > > I think it would be cleaner not to process argument 1 at all for > > CFN_FTRUNC_INT. There's no particular need to vectorise it. > > I agree with this, will change the loop to continue for argument 1 when > dealing with non-masked CFN's. > > >>} > >> […] > >> diff --git a/gcc/tree.c b/gcc/tree.c > >> index > >> 845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1 > >> 100644 > >> --- a/gcc/tree.c > >> +++ b/gcc/tree.c > >> @@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size, > >> cst_size_error *perr /* = NULL */) > >> return true; > >> } > >> > >> -/* Return the precision of the type, or for a complex or vector type the > >> - precision of the type of its elements. */ > >> +/* Return the type, or for a complex or vector type the type of its > >> + elements. */ > >> -unsigned int > >> -element_precision (const_tree type) > >> +tree > >> +element_type (const_tree type) > >> { > >> if (!TYPE_P (type)) > >> type = TREE_TYPE (type); > >> @@ -6657,7 +6657,16 @@ element_precision (const_tree type) > >> if (code == COMPLEX_TYPE || code == VECTOR_TYPE) > >> type = TREE_TYPE (type); > >> - return TYPE_PRECISION (type); > >> + return (tree) type; > > I think we should stick a const_cast in element_precision and make > > element_type take a plain “type”. As it stands element_type is an > > implicit const_cast for many cases. > > > > Thanks, > Was just curious about something here, I thought the purpose of having > element_precision (before) and element_type (now) take a const_tree as an > argument was to make it clear we aren't changing the input type. I understand > that as it stands element_type could be an implicit const_cast (which I should > be using rather than the '(tree)' cast), but that's only if 'type' is a type > that isn't complex/vector, either way, we are conforming to the promise that > we aren't changing the incoming type, what the caller then does with the > result is up to them no? > > I don't mind making the changes, just trying to understand the reasoning > behind it. > > I'll send in a new patch with all the changes after the review on the match.pd > stuff. I'm missing an updated patch after my initial review of the match.pd stuff so not sure what to review. Can you re-post and updated patch? Thanks, Richard.
Re: [AArch64] Enable generation of FRINTNZ instructions
"Andre Vieira (lists)" writes: > On 17/12/2021 12:44, Richard Sandiford wrote: >> >>> @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo, >>> rhs_type = unsigned_type_node; >>> } >>> >>> - int mask_opno = -1; >>> + /* The argument that is not of the same type as the others. */ >>> + int diff_opno = -1; >>> + bool masked = false; >>> if (internal_fn_p (cfn)) >>> -mask_opno = internal_fn_mask_index (as_internal_fn (cfn)); >>> +{ >>> + if (cfn == CFN_FTRUNC_INT) >>> + /* For FTRUNC this represents the argument that carries the type of the >>> + intermediate signed integer. */ >>> + diff_opno = 1; >>> + else >>> + { >>> + /* For masked operations this represents the argument that carries the >>> +mask. */ >>> + diff_opno = internal_fn_mask_index (as_internal_fn (cfn)); >>> + masked = diff_opno >= 0; >>> + } >>> +} >> I think it would be cleaner not to process argument 1 at all for >> CFN_FTRUNC_INT. There's no particular need to vectorise it. > > I agree with this, will change the loop to continue for argument 1 when > dealing with non-masked CFN's. > >>> } >>> […] >>> diff --git a/gcc/tree.c b/gcc/tree.c >>> index >>> 845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1 >>> 100644 >>> --- a/gcc/tree.c >>> +++ b/gcc/tree.c >>> @@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size, >>> cst_size_error *perr /* = NULL */) >>> return true; >>> } >>> >>> -/* Return the precision of the type, or for a complex or vector type the >>> - precision of the type of its elements. */ >>> +/* Return the type, or for a complex or vector type the type of its >>> + elements. */ >>> >>> -unsigned int >>> -element_precision (const_tree type) >>> +tree >>> +element_type (const_tree type) >>> { >>> if (!TYPE_P (type)) >>> type = TREE_TYPE (type); >>> @@ -6657,7 +6657,16 @@ element_precision (const_tree type) >>> if (code == COMPLEX_TYPE || code == VECTOR_TYPE) >>> type = TREE_TYPE (type); >>> >>> - return TYPE_PRECISION (type); >>> + return (tree) type; >> I think we should stick a const_cast in element_precision and make >> element_type take a plain “type”. As it stands element_type is an >> implicit const_cast for many cases. >> >> Thanks, > Was just curious about something here, I thought the purpose of having > element_precision (before) and element_type (now) take a const_tree as > an argument was to make it clear we aren't changing the input type. I > understand that as it stands element_type could be an implicit > const_cast (which I should be using rather than the '(tree)' cast), but > that's only if 'type' is a type that isn't complex/vector, either way, > we are conforming to the promise that we aren't changing the incoming > type, what the caller then does with the result is up to them no? > > I don't mind making the changes, just trying to understand the reasoning > behind it. The problem with the above is that functions like the following become well-typed: void foo (const_tree t) { TYPE_MODE (element_type (t)) = VOIDmode; } even though element_type (t) could well be t. One of the points of const_tree (and const pointer targets in general) is to use the type system to enforce the promise that the value isn't changed. I guess the above is similar to the traditional problem with functions like index and strstr, which take a const char * but return a char *. So for example: void foo (const char *x) { index (x, '.') = 0; } is well-typed. But the equivalent C++ code (using iterators) would be rejected. If C allowed overloading them the correct prototypes would be: const char *index (const char *, int); char *index (char *, int); And I think the same applies here. Either we should provide two functions: const_tree element_type (const_tree); tree element_type (tree); or just the latter. Thanks, Richard
Re: [AArch64] Enable generation of FRINTNZ instructions
Hi Richard, Thank you for the review, I've adopted all above suggestions downstream, I am still surprised how many style things I still miss after years of gcc development :( On 17/12/2021 12:44, Richard Sandiford wrote: @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo, rhs_type = unsigned_type_node; } - int mask_opno = -1; + /* The argument that is not of the same type as the others. */ + int diff_opno = -1; + bool masked = false; if (internal_fn_p (cfn)) -mask_opno = internal_fn_mask_index (as_internal_fn (cfn)); +{ + if (cfn == CFN_FTRUNC_INT) + /* For FTRUNC this represents the argument that carries the type of the + intermediate signed integer. */ + diff_opno = 1; + else + { + /* For masked operations this represents the argument that carries the +mask. */ + diff_opno = internal_fn_mask_index (as_internal_fn (cfn)); + masked = diff_opno >= 0; + } +} I think it would be cleaner not to process argument 1 at all for CFN_FTRUNC_INT. There's no particular need to vectorise it. I agree with this, will change the loop to continue for argument 1 when dealing with non-masked CFN's. } […] diff --git a/gcc/tree.c b/gcc/tree.c index 845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size, cst_size_error *perr /* = NULL */) return true; } -/* Return the precision of the type, or for a complex or vector type the - precision of the type of its elements. */ +/* Return the type, or for a complex or vector type the type of its + elements. */ -unsigned int -element_precision (const_tree type) +tree +element_type (const_tree type) { if (!TYPE_P (type)) type = TREE_TYPE (type); @@ -6657,7 +6657,16 @@ element_precision (const_tree type) if (code == COMPLEX_TYPE || code == VECTOR_TYPE) type = TREE_TYPE (type); - return TYPE_PRECISION (type); + return (tree) type; I think we should stick a const_cast in element_precision and make element_type take a plain “type”. As it stands element_type is an implicit const_cast for many cases. Thanks, Was just curious about something here, I thought the purpose of having element_precision (before) and element_type (now) take a const_tree as an argument was to make it clear we aren't changing the input type. I understand that as it stands element_type could be an implicit const_cast (which I should be using rather than the '(tree)' cast), but that's only if 'type' is a type that isn't complex/vector, either way, we are conforming to the promise that we aren't changing the incoming type, what the caller then does with the result is up to them no? I don't mind making the changes, just trying to understand the reasoning behind it. I'll send in a new patch with all the changes after the review on the match.pd stuff. Thanks, Andre
Re: [AArch64] Enable generation of FRINTNZ instructions
"Andre Vieira (lists) via Gcc-patches" writes: > On 22/11/2021 11:41, Richard Biener wrote: >> >>> On 18/11/2021 11:05, Richard Biener wrote: This is a good shout and made me think about something I hadn't before... I thought I could handle the vector forms later, but the problem is if I add support for the scalar, it will stop the vectorizer. It seems vectorizable_call expects all arguments to have the same type, which doesn't work with passing the integer type as an operand work around. >> We already special case some IFNs there (masked load/store and gather) >> to ignore some args, so that would just add to this set. >> >> Richard. > Hi, > > Reworked it to add support of the new IFN to the vectorizer. Was > initially trying to make vectorizable_call and > vectorizable_internal_function handle IFNs with different inputs more > generically, using the information we have in the _direct structs > regarding what operands to get the modes from. Unfortunately, that > wasn't straightforward because of how vectorizable_call assumes operands > have the same type and uses the type of the DEF_STMT_INFO of the > non-constant operands (either output operand or non-constant inputs) to > determine the type of constants. I assume there is some reason why we > use the DEF_STMT_INFO and not always use get_vectype_for_scalar_type on > the argument types. That is why I ended up with this sort of half-way > mix of both, which still allows room to add more IFNs that don't take > inputs of the same type, but require adding a bit of special casing > similar to the IFN_FTRUNC_INT and masking ones. > > Bootstrapped on aarch64-none-linux. Still leaving the match.pd stuff to Richard, but otherwise: > index > bdc8ba3576cf2c9b4ae96b45a382234e4e25b13f..51f00344b02d0d1d4adf97463f6a46f9fd0fb43f > 100644 > --- a/gcc/config/aarch64/iterators.md > +++ b/gcc/config/aarch64/iterators.md > @@ -160,7 +160,11 @@ (define_mode_iterator VHSDF_HSDF [(V4HF > "TARGET_SIMD_F16INST") > SF DF]) > > ;; Scalar and vetor modes for SF, DF. > -(define_mode_iterator VSFDF [V2SF V4SF V2DF DF SF]) > +(define_mode_iterator VSFDF [ (V2SF "TARGET_SIMD") Nit: excess space between [ and (. > + (V4SF "TARGET_SIMD") > + (V2DF "TARGET_SIMD") > + (DF "TARGET_FLOAT") > + (SF "TARGET_FLOAT")]) > > ;; Advanced SIMD single Float modes. > (define_mode_iterator VDQSF [V2SF V4SF]) > […] > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index > 41f1850bf6e95005647ca97a495a97d7e184d137..d50d09b0ae60d98537b9aece4396a490f33f174c > 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -6175,6 +6175,15 @@ operands; otherwise, it may not. > > This pattern is not allowed to @code{FAIL}. > > +@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern > +@item @samp{ftrunc@var{m}@var{n}2} > +Truncate operand 1 to a @var{n} mode signed integer, towards zero, and store > +the result in operand 0. Both operands have mode @var{m}, which is a scalar > or > +vector floating-point mode. Exception must be thrown if operand 1 does not > fit Maybe “An exception must be raised”? “thrown” makes it sound like a signal must be raised or C++ exception thrown. > +in a @var{n} mode signed integer as it would have if the truncation happened > +through separate floating point to integer conversion. > + > + > @cindex @code{round@var{m}2} instruction pattern > @item @samp{round@var{m}2} > Round operand 1 to the nearest integer, rounding away from zero in the > […] > @@ -3688,6 +3712,15 @@ multi_vector_optab_supported_p (convert_optab optab, > tree_pair types, > != CODE_FOR_nothing); > } > > +static bool direct_ftrunc_int_optab_supported_p (convert_optab optab, > + tree_pair types, > + optimization_type opt_type) Formatting nit: should be a line break after “bool” > +{ > + return (convert_optab_handler (optab, TYPE_MODE (types.first), > + TYPE_MODE (element_type (types.second)), > + opt_type) != CODE_FOR_nothing); > +} > + > #define direct_unary_optab_supported_p direct_optab_supported_p > #define direct_binary_optab_supported_p direct_optab_supported_p > #define direct_ternary_optab_supported_p direct_optab_supported_p > […] > diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c > b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c > new file mode 100644 > index > ..b93304eb2acb3d3d954eebee51d77ff23fee68ac > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c > @@ -0,0 +1,47 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -march=armv8.5-a" } */ > +/* { dg-require-effective-target aarch64_frintnzx_ok } */ > +/* { dg-final { check-function-bodies "**" "" } } */ > + > +
Re: [AArch64] Enable generation of FRINTNZ instructions
ping On 25/11/2021 13:53, Andre Vieira (lists) via Gcc-patches wrote: On 22/11/2021 11:41, Richard Biener wrote: On 18/11/2021 11:05, Richard Biener wrote: This is a good shout and made me think about something I hadn't before... I thought I could handle the vector forms later, but the problem is if I add support for the scalar, it will stop the vectorizer. It seems vectorizable_call expects all arguments to have the same type, which doesn't work with passing the integer type as an operand work around. We already special case some IFNs there (masked load/store and gather) to ignore some args, so that would just add to this set. Richard. Hi, Reworked it to add support of the new IFN to the vectorizer. Was initially trying to make vectorizable_call and vectorizable_internal_function handle IFNs with different inputs more generically, using the information we have in the _direct structs regarding what operands to get the modes from. Unfortunately, that wasn't straightforward because of how vectorizable_call assumes operands have the same type and uses the type of the DEF_STMT_INFO of the non-constant operands (either output operand or non-constant inputs) to determine the type of constants. I assume there is some reason why we use the DEF_STMT_INFO and not always use get_vectype_for_scalar_type on the argument types. That is why I ended up with this sort of half-way mix of both, which still allows room to add more IFNs that don't take inputs of the same type, but require adding a bit of special casing similar to the IFN_FTRUNC_INT and masking ones. Bootstrapped on aarch64-none-linux. OK for trunk? gcc/ChangeLog: * config/aarch64/aarch64.md (ftrunc2): New pattern. * config/aarch64/iterators.md (FRINTNZ): New iterator. (frintnz_mode): New int attribute. (VSFDF): Make iterator conditional. * internal-fn.def (FTRUNC_INT): New IFN. * internal-fn.c (ftrunc_int_direct): New define. (expand_ftrunc_int_optab_fn): New custom expander. (direct_ftrunc_int_optab_supported_p): New supported_p. * match.pd: Add to the existing TRUNC pattern match. * optabs.def (ftrunc_int): New entry. * stor-layout.h (element_precision): Moved from here... * tree.h (element_precision): ... to here. (element_type): New declaration. * tree.c (element_type): New function. (element_precision): Changed to use element_type. * tree-vect-stmts.c (vectorizable_internal_function): Add support for IFNs with different input types. (vectorizable_call): Teach to handle IFN_FTRUNC_INT. * doc/md.texi: New entry for ftrunc pattern name. * doc/sourcebuild.texi (aarch64_frintzx_ok): New target. gcc/testsuite/ChangeLog: * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz instruction available. * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target. * gcc.target/aarch64/frintnz.c: New test. * gcc.target/aarch64/frintnz_vec.c: New test.
Re: [AArch64] Enable generation of FRINTNZ instructions
On 18/11/2021 11:05, Richard Biener wrote: + (if (!flag_trapping_math + && direct_internal_fn_supported_p (IFN_TRUNC, type, +OPTIMIZE_FOR_BOTH)) + (IFN_TRUNC @0) #endif does IFN_FTRUNC_INT preserve the same exceptions as doing explicit intermediate float->int conversions? I think I'd prefer to have !flag_trapping_math on both cases. I realized I never responded to this. The AArch64 instructions mimic the behaviour you'd see if you were doing explicit conversions, so I'll be defining the new IFN and optab to require the same, such that these can be used by the compiler when flag_trapping_math. In the patch I sent last I added some likes to the md.texi description of the optab to that intent.
Re: [AArch64] Enable generation of FRINTNZ instructions
On 22/11/2021 11:41, Richard Biener wrote: On 18/11/2021 11:05, Richard Biener wrote: This is a good shout and made me think about something I hadn't before... I thought I could handle the vector forms later, but the problem is if I add support for the scalar, it will stop the vectorizer. It seems vectorizable_call expects all arguments to have the same type, which doesn't work with passing the integer type as an operand work around. We already special case some IFNs there (masked load/store and gather) to ignore some args, so that would just add to this set. Richard. Hi, Reworked it to add support of the new IFN to the vectorizer. Was initially trying to make vectorizable_call and vectorizable_internal_function handle IFNs with different inputs more generically, using the information we have in the _direct structs regarding what operands to get the modes from. Unfortunately, that wasn't straightforward because of how vectorizable_call assumes operands have the same type and uses the type of the DEF_STMT_INFO of the non-constant operands (either output operand or non-constant inputs) to determine the type of constants. I assume there is some reason why we use the DEF_STMT_INFO and not always use get_vectype_for_scalar_type on the argument types. That is why I ended up with this sort of half-way mix of both, which still allows room to add more IFNs that don't take inputs of the same type, but require adding a bit of special casing similar to the IFN_FTRUNC_INT and masking ones. Bootstrapped on aarch64-none-linux. OK for trunk? gcc/ChangeLog: * config/aarch64/aarch64.md (ftrunc2): New pattern. * config/aarch64/iterators.md (FRINTNZ): New iterator. (frintnz_mode): New int attribute. (VSFDF): Make iterator conditional. * internal-fn.def (FTRUNC_INT): New IFN. * internal-fn.c (ftrunc_int_direct): New define. (expand_ftrunc_int_optab_fn): New custom expander. (direct_ftrunc_int_optab_supported_p): New supported_p. * match.pd: Add to the existing TRUNC pattern match. * optabs.def (ftrunc_int): New entry. * stor-layout.h (element_precision): Moved from here... * tree.h (element_precision): ... to here. (element_type): New declaration. * tree.c (element_type): New function. (element_precision): Changed to use element_type. * tree-vect-stmts.c (vectorizable_internal_function): Add support for IFNs with different input types. (vectorizable_call): Teach to handle IFN_FTRUNC_INT. * doc/md.texi: New entry for ftrunc pattern name. * doc/sourcebuild.texi (aarch64_frintzx_ok): New target. gcc/testsuite/ChangeLog: * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz instruction available. * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target. * gcc.target/aarch64/frintnz.c: New test. * gcc.target/aarch64/frintnz_vec.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 4035e061706793849c68ae09bcb2e4b9580ab7b6..c5c60e7a810e22b0ea9ed6bf056ddd6431d60269 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -7345,12 +7345,18 @@ (define_insn "despeculate_simpleti" (set_attr "speculation_barrier" "true")] ) +(define_expand "ftrunc2" + [(set (match_operand:VSFDF 0 "register_operand" "=w") +(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")] + FRINTNZ))] + "TARGET_FRINT" +) + (define_insn "aarch64_" [(set (match_operand:VSFDF 0 "register_operand" "=w") (unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")] FRINTNZX))] - "TARGET_FRINT && TARGET_FLOAT - && !(VECTOR_MODE_P (mode) && !TARGET_SIMD)" + "TARGET_FRINT" "\\t%0, %1" [(set_attr "type" "f_rint")] ) diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index bdc8ba3576cf2c9b4ae96b45a382234e4e25b13f..51f00344b02d0d1d4adf97463f6a46f9fd0fb43f 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -160,7 +160,11 @@ (define_mode_iterator VHSDF_HSDF [(V4HF "TARGET_SIMD_F16INST") SF DF]) ;; Scalar and vetor modes for SF, DF. -(define_mode_iterator VSFDF [V2SF V4SF V2DF DF SF]) +(define_mode_iterator VSFDF [ (V2SF "TARGET_SIMD") + (V4SF "TARGET_SIMD") + (V2DF "TARGET_SIMD") + (DF "TARGET_FLOAT") + (SF "TARGET_FLOAT")]) ;; Advanced SIMD single Float modes. (define_mode_iterator VDQSF [V2SF V4SF]) @@ -3067,6 +3071,8 @@ (define_int_iterator FCMLA [UNSPEC_FCMLA (define_int_iterator FRINTNZX [UNSPEC_FRINT32Z UNSPEC_FRINT32X UNSPEC_FRINT64Z UNSPEC_FRINT64X]) +(define_int_iterator FRINTNZ [UNSPEC_FRINT32Z UNSPEC_FRINT64Z]) + (define_int_iter
Re: [AArch64] Enable generation of FRINTNZ instructions
On Mon, 22 Nov 2021, Andre Vieira (lists) wrote: > > On 18/11/2021 11:05, Richard Biener wrote: > > > > @@ -3713,12 +3713,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > trapping behaviour, so require !flag_trapping_math. */ > > #if GIMPLE > > (simplify > > - (float (fix_trunc @0)) > > - (if (!flag_trapping_math > > - && types_match (type, TREE_TYPE (@0)) > > - && direct_internal_fn_supported_p (IFN_TRUNC, type, > > - OPTIMIZE_FOR_BOTH)) > > - (IFN_TRUNC @0))) > > + (float (fix_trunc@1 @0)) > > + (if (types_match (type, TREE_TYPE (@0))) > > +(if (TYPE_SIGN (TREE_TYPE (@1)) == SIGNED > > +&& direct_internal_fn_supported_p (IFN_FTRUNC_INT, type, > > + TREE_TYPE (@1), > > OPTIMIZE_FOR_BOTH)) > > + (with { > > + tree int_type = TREE_TYPE (@1); > > + unsigned HOST_WIDE_INT max_int_c > > + = (1ULL << (element_precision (int_type) - 1)) - 1; > > > > That's only half-way supporting vector types I fear - you use > > element_precision but then build a vector integer constant > > in an unsupported way. I suppose vector support isn't present > > for arm? The cleanest way would probably be to do > > > > tree int_type = element_type (@1); > > > > with providing element_type in tree.[ch] like we provide > > element_precision. > This is a good shout and made me think about something I hadn't before... I > thought I could handle the vector forms later, but the problem is if I add > support for the scalar, it will stop the vectorizer. It seems > vectorizable_call expects all arguments to have the same type, which doesn't > work with passing the integer type as an operand work around. We already special case some IFNs there (masked load/store and gather) to ignore some args, so that would just add to this set. Richard.
Re: [AArch64] Enable generation of FRINTNZ instructions
On 18/11/2021 11:05, Richard Biener wrote: @@ -3713,12 +3713,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) trapping behaviour, so require !flag_trapping_math. */ #if GIMPLE (simplify - (float (fix_trunc @0)) - (if (!flag_trapping_math - && types_match (type, TREE_TYPE (@0)) - && direct_internal_fn_supported_p (IFN_TRUNC, type, - OPTIMIZE_FOR_BOTH)) - (IFN_TRUNC @0))) + (float (fix_trunc@1 @0)) + (if (types_match (type, TREE_TYPE (@0))) +(if (TYPE_SIGN (TREE_TYPE (@1)) == SIGNED +&& direct_internal_fn_supported_p (IFN_FTRUNC_INT, type, + TREE_TYPE (@1), OPTIMIZE_FOR_BOTH)) + (with { + tree int_type = TREE_TYPE (@1); + unsigned HOST_WIDE_INT max_int_c + = (1ULL << (element_precision (int_type) - 1)) - 1; That's only half-way supporting vector types I fear - you use element_precision but then build a vector integer constant in an unsupported way. I suppose vector support isn't present for arm? The cleanest way would probably be to do tree int_type = element_type (@1); with providing element_type in tree.[ch] like we provide element_precision. This is a good shout and made me think about something I hadn't before... I thought I could handle the vector forms later, but the problem is if I add support for the scalar, it will stop the vectorizer. It seems vectorizable_call expects all arguments to have the same type, which doesn't work with passing the integer type as an operand work around. Should I go back to two separate IFN's, could still have the single optab. Regards, Andre
Re: [AArch64] Enable generation of FRINTNZ instructions
On Wed, 17 Nov 2021, Andre Vieira (lists) wrote: > > On 16/11/2021 12:10, Richard Biener wrote: > > On Fri, 12 Nov 2021, Andre Simoes Dias Vieira wrote: > > > >> On 12/11/2021 10:56, Richard Biener wrote: > >>> On Thu, 11 Nov 2021, Andre Vieira (lists) wrote: > >>> > Hi, > > This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding > optabs and mappings. It also creates a backend pattern to implement them > for > aarch64 and a match.pd pattern to idiom recognize these. > These IFN's (and optabs) represent a truncation towards zero, as if > performed > by first casting it to a signed integer of 32 or 64 bits and then back to > the > same floating point type/mode. > > The match.pd pattern choses to use these, when supported, regardless of > trapping math, since these new patterns mimic the original behavior of > truncating through an integer. > > I didn't think any of the existing IFN's represented these. I know it's a > bit > late in stage 1, but I thought this might be OK given it's only used by a > single target and should have very little impact on anything else. > > Bootstrapped on aarch64-none-linux. > > OK for trunk? > >>> On the RTL side ftrunc32/ftrunc64 would probably be better a conversion > >>> optab (with two modes), so not > >>> > >>> +OPTAB_D (ftrunc32_optab, "ftrunc$asi2") > >>> +OPTAB_D (ftrunc64_optab, "ftrunc$adi2") > >>> > >>> but > >>> > >>> OPTAB_CD (ftrunc_shrt_optab, "ftrunc$a$I$b2") > >>> > >>> or so? I know that gets somewhat awkward for the internal function, > >>> but IMHO we shouldn't tie our hands because of that? > >> I tried doing this originally, but indeed I couldn't find a way to > >> correctly > >> tie the internal function to it. > >> > >> direct_optab_supported_p with multiple types expect those to be of the same > >> mode. I see convert_optab_supported_p does but I don't know how that is > >> used... > >> > >> Any ideas? > > No "nice" ones. The "usual" way is to provide fake arguments that > > specify the type/mode. We could use an integer argument directly > > secifying the mode (then the IL would look host dependent - ugh), > > or specify a constant zero in the intended mode (less visibly > > obvious - but at least with -gimple dumping you'd see the type...). > Hi, > > So I reworked this to have a single optab and IFN. This required a bit of > fiddling with custom expander and supported_p functions for the IFN. I decided > to pass a MAX_INT for the 'int' type to the IFN to be able to pass on the size > of the int we use as an intermediate cast. I tried 0 first, but gcc was being > too smart and just demoted it to an 'int' for the long long test-cases. > > Bootstrapped on aarch64-none-linux. > > OK for trunk? @@ -3713,12 +3713,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) trapping behaviour, so require !flag_trapping_math. */ #if GIMPLE (simplify - (float (fix_trunc @0)) - (if (!flag_trapping_math - && types_match (type, TREE_TYPE (@0)) - && direct_internal_fn_supported_p (IFN_TRUNC, type, - OPTIMIZE_FOR_BOTH)) - (IFN_TRUNC @0))) + (float (fix_trunc@1 @0)) + (if (types_match (type, TREE_TYPE (@0))) +(if (TYPE_SIGN (TREE_TYPE (@1)) == SIGNED +&& direct_internal_fn_supported_p (IFN_FTRUNC_INT, type, + TREE_TYPE (@1), OPTIMIZE_FOR_BOTH)) + (with { + tree int_type = TREE_TYPE (@1); + unsigned HOST_WIDE_INT max_int_c + = (1ULL << (element_precision (int_type) - 1)) - 1; That's only half-way supporting vector types I fear - you use element_precision but then build a vector integer constant in an unsupported way. I suppose vector support isn't present for arm? The cleanest way would probably be to do tree int_type = element_type (@1); with providing element_type in tree.[ch] like we provide element_precision. + } + (IFN_FTRUNC_INT @0 { build_int_cst (int_type, max_int_c); })) Then you could use wide_int_to_tree (int_type, wi::max_value (TYPE_PRECISION (int_type), SIGNED)) to build the special integer constant (which seems to be always scalar). + (if (!flag_trapping_math + && direct_internal_fn_supported_p (IFN_TRUNC, type, +OPTIMIZE_FOR_BOTH)) + (IFN_TRUNC @0) #endif does IFN_FTRUNC_INT preserve the same exceptions as doing explicit intermediate float->int conversions? I think I'd prefer to have !flag_trapping_math on both cases. > gcc/ChangeLog: > > * config/aarch64/aarch64.md (ftrunc2): New > pattern. > * config/aarch64/iterators.md (FRINTZ): New iterator. > * doc/md.texi: New entry for ftrunc pattern name. > * internal-fn.def (FTRUNC_INT): New IFN. > * match.pd: Add to the existing TRUNC pattern match. > * optabs.def (ftrunc_int): New entry. > > gcc
Re: [AArch64] Enable generation of FRINTNZ instructions
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index > 4035e061706793849c68ae09bcb2e4b9580ab7b6..62adbc4cb6bbbe0c856f9fbe451aee08f2dea3b5 > 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -7345,6 +7345,14 @@ (define_insn "despeculate_simpleti" > (set_attr "speculation_barrier" "true")] > ) > > +(define_expand "ftrunc2" > + [(set (match_operand:VSFDF 0 "register_operand" "=w") > +(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")] > + FRINTNZ))] > + "TARGET_FRINT && TARGET_FLOAT > + && !(VECTOR_MODE_P (mode) && !TARGET_SIMD)" > +) Probably just me, but this condition seems quite hard to read. I think it'd be better to add conditions to the VSFDF definition instead, a bit like we do for the HF entries in VHSDF_HSDF and VHSDF_DF. I.e.: (define_mode_iterator VSFDF [(V2SF "TARGET_SIMD") (V4SF "TARGET_SIMD") (V2DF "TARGET_SIMD") (SF "TARGET_FLOAT") (DF "TARGET_FLOAT")]) Then the condition can be "TARGET_FRINT". Same for the existing aarch64_. > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > index > bb13c6cce1bf55633760bc14980402f1f0ac1689..fb97d37cecae17cdb6444e7f3391361b214f0712 > 100644 > --- a/gcc/internal-fn.def > +++ b/gcc/internal-fn.def > @@ -269,6 +269,7 @@ DEF_INTERNAL_FLT_FLOATN_FN (RINT, ECF_CONST, rint, unary) > DEF_INTERNAL_FLT_FLOATN_FN (ROUND, ECF_CONST, round, unary) > DEF_INTERNAL_FLT_FLOATN_FN (ROUNDEVEN, ECF_CONST, roundeven, unary) > DEF_INTERNAL_FLT_FLOATN_FN (TRUNC, ECF_CONST, btrunc, unary) > +DEF_INTERNAL_OPTAB_FN (FTRUNC_INT, ECF_CONST, ftruncint, ftrunc_int) ftrunc_int should be described in the comment at the top of the file. E.g.: - ftrunc_int: a unary conversion optab that takes and returns values of the same mode, but internally converts via another mode. This second mode is specified using a dummy final function argument. > diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz.c > b/gcc/testsuite/gcc.target/aarch64/frintnz.c > new file mode 100644 > index > ..2e1971f8aa11d8b95f454d03a03e050a3bf96747 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/frintnz.c > @@ -0,0 +1,88 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -march=armv8.5-a" } */ > +/* { dg-require-effective-target arm_v8_5a_frintnzx_ok } */ > +/* { dg-final { check-function-bodies "**" "" } } */ > + > +/* > +** f1: > +** ... > +** frint32zs0, s0 > +** ... Are these functions ever more than just: f1: frint32zs0, s0 ret ? If not, I think we should match that sequence and “defend” the good codegen. The problem with ... on both sides is that it's then not clear why we can rely on register 0 being used. > +*/ > +float > +f1 (float x) > +{ > + int y = x; > + return (float) y; > +} > + > +/* > +** f2: > +** ... > +** frint64zs0, s0 > +** ... > +*/ > +float > +f2 (float x) > +{ > + long long int y = x; > + return (float) y; > +} > + > +/* > +** f3: > +** ... > +** frint32zd0, d0 > +** ... > +*/ > +double > +f3 (double x) > +{ > + int y = x; > + return (double) y; > +} > + > +/* > +** f4: > +** ... > +** frint64zd0, d0 > +** ... > +*/ > +double > +f4 (double x) > +{ > + long long int y = x; > + return (double) y; > +} > + > +float > +f1_dont (float x) > +{ > + unsigned int y = x; > + return (float) y; > +} > + > +float > +f2_dont (float x) > +{ > + unsigned long long int y = x; > + return (float) y; > +} > + > +double > +f3_dont (double x) > +{ > + unsigned int y = x; > + return (double) y; > +} > + > +double > +f4_dont (double x) > +{ > + unsigned long long int y = x; > + return (double) y; > +} > + > +/* Make sure the 'dont's don't generate any frintNz. */ > +/* { dg-final { scan-assembler-times {frint32z} 2 } } */ > +/* { dg-final { scan-assembler-times {frint64z} 2 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c > b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c > index > 07217064e2ba54fcf4f5edc440e6ec19ddae66e1..3b34dc3ad79f1406a41ec4c00db10347ba1ca2c4 > 100644 > --- a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c > +++ b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c > @@ -1,5 +1,6 @@ > /* { dg-do compile } */ > /* { dg-options "-O2 -ffast-math" } */ > +/* { dg-skip-if "" { arm_v8_5a_frintnzx_ok } } */ > > float > f1 (float x) > diff --git a/gcc/testsuite/lib/target-supports.exp > b/gcc/testsuite/lib/target-supports.exp > index > 8cbda192fe0fae59ea208ee43696b4d22c43e61e..7fa1659ce734257f3cd96f1e2e50ace4d02dcf51 > 100644 > --- a/gcc/testsuite/lib/target-supports.exp > +++ b/gcc/testsuite/lib/target-supports.exp > @@ -11365,6 +11365,33 @@ proc check_effective_target_arm_v8_3a_bkey_directive > { } { > }] > } > > +# Return 1 if the target supports ARMv8.
Re: [AArch64] Enable generation of FRINTNZ instructions
On 16/11/2021 12:10, Richard Biener wrote: On Fri, 12 Nov 2021, Andre Simoes Dias Vieira wrote: On 12/11/2021 10:56, Richard Biener wrote: On Thu, 11 Nov 2021, Andre Vieira (lists) wrote: Hi, This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding optabs and mappings. It also creates a backend pattern to implement them for aarch64 and a match.pd pattern to idiom recognize these. These IFN's (and optabs) represent a truncation towards zero, as if performed by first casting it to a signed integer of 32 or 64 bits and then back to the same floating point type/mode. The match.pd pattern choses to use these, when supported, regardless of trapping math, since these new patterns mimic the original behavior of truncating through an integer. I didn't think any of the existing IFN's represented these. I know it's a bit late in stage 1, but I thought this might be OK given it's only used by a single target and should have very little impact on anything else. Bootstrapped on aarch64-none-linux. OK for trunk? On the RTL side ftrunc32/ftrunc64 would probably be better a conversion optab (with two modes), so not +OPTAB_D (ftrunc32_optab, "ftrunc$asi2") +OPTAB_D (ftrunc64_optab, "ftrunc$adi2") but OPTAB_CD (ftrunc_shrt_optab, "ftrunc$a$I$b2") or so? I know that gets somewhat awkward for the internal function, but IMHO we shouldn't tie our hands because of that? I tried doing this originally, but indeed I couldn't find a way to correctly tie the internal function to it. direct_optab_supported_p with multiple types expect those to be of the same mode. I see convert_optab_supported_p does but I don't know how that is used... Any ideas? No "nice" ones. The "usual" way is to provide fake arguments that specify the type/mode. We could use an integer argument directly secifying the mode (then the IL would look host dependent - ugh), or specify a constant zero in the intended mode (less visibly obvious - but at least with -gimple dumping you'd see the type...). Hi, So I reworked this to have a single optab and IFN. This required a bit of fiddling with custom expander and supported_p functions for the IFN. I decided to pass a MAX_INT for the 'int' type to the IFN to be able to pass on the size of the int we use as an intermediate cast. I tried 0 first, but gcc was being too smart and just demoted it to an 'int' for the long long test-cases. Bootstrapped on aarch64-none-linux. OK for trunk? gcc/ChangeLog: * config/aarch64/aarch64.md (ftrunc2): New pattern. * config/aarch64/iterators.md (FRINTZ): New iterator. * doc/md.texi: New entry for ftrunc pattern name. * internal-fn.def (FTRUNC_INT): New IFN. * match.pd: Add to the existing TRUNC pattern match. * optabs.def (ftrunc_int): New entry. gcc/testsuite/ChangeLog: * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz instruction available. * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target. * gcc.target/aarch64/frintnz.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 4035e061706793849c68ae09bcb2e4b9580ab7b6..62adbc4cb6bbbe0c856f9fbe451aee08f2dea3b5 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -7345,6 +7345,14 @@ (define_insn "despeculate_simpleti" (set_attr "speculation_barrier" "true")] ) +(define_expand "ftrunc2" + [(set (match_operand:VSFDF 0 "register_operand" "=w") +(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")] + FRINTNZ))] + "TARGET_FRINT && TARGET_FLOAT + && !(VECTOR_MODE_P (mode) && !TARGET_SIMD)" +) + (define_insn "aarch64_" [(set (match_operand:VSFDF 0 "register_operand" "=w") (unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")] diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index bdc8ba3576cf2c9b4ae96b45a382234e4e25b13f..49510488a2a800689e95c399f2e6c967b566516d 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -3067,6 +3067,8 @@ (define_int_iterator FCMLA [UNSPEC_FCMLA (define_int_iterator FRINTNZX [UNSPEC_FRINT32Z UNSPEC_FRINT32X UNSPEC_FRINT64Z UNSPEC_FRINT64X]) +(define_int_iterator FRINTNZ [UNSPEC_FRINT32Z UNSPEC_FRINT64Z]) + (define_int_iterator SVE_BRK_UNARY [UNSPEC_BRKA UNSPEC_BRKB]) (define_int_iterator SVE_BRK_BINARY [UNSPEC_BRKN UNSPEC_BRKPA UNSPEC_BRKPB]) @@ -3482,6 +3484,8 @@ (define_int_attr f16mac1 [(UNSPEC_FMLAL "a") (UNSPEC_FMLSL "s") (define_int_attr frintnzs_op [(UNSPEC_FRINT32Z "frint32z") (UNSPEC_FRINT32X "frint32x") (UNSPEC_FRINT64Z "frint64z") (UNSPEC_FRINT64X "frint64x")]) +(define_int_attr frintnz_mode [(UNSPEC_FRINT32Z "si") (UNSPEC_FRINT64Z "di")]) + ;; The condition associated with an UNSPEC_COND_. (define_int_attr cmp_op [(UNSPEC_COND_CMPEQ_WIDE "eq") (UNSPEC_COND_CMPG
Re: [AArch64] Enable generation of FRINTNZ instructions
On Fri, 12 Nov 2021, Andre Simoes Dias Vieira wrote: > > On 12/11/2021 10:56, Richard Biener wrote: > > On Thu, 11 Nov 2021, Andre Vieira (lists) wrote: > > > >> Hi, > >> > >> This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding > >> optabs and mappings. It also creates a backend pattern to implement them > >> for > >> aarch64 and a match.pd pattern to idiom recognize these. > >> These IFN's (and optabs) represent a truncation towards zero, as if > >> performed > >> by first casting it to a signed integer of 32 or 64 bits and then back to > >> the > >> same floating point type/mode. > >> > >> The match.pd pattern choses to use these, when supported, regardless of > >> trapping math, since these new patterns mimic the original behavior of > >> truncating through an integer. > >> > >> I didn't think any of the existing IFN's represented these. I know it's a > >> bit > >> late in stage 1, but I thought this might be OK given it's only used by a > >> single target and should have very little impact on anything else. > >> > >> Bootstrapped on aarch64-none-linux. > >> > >> OK for trunk? > > On the RTL side ftrunc32/ftrunc64 would probably be better a conversion > > optab (with two modes), so not > > > > +OPTAB_D (ftrunc32_optab, "ftrunc$asi2") > > +OPTAB_D (ftrunc64_optab, "ftrunc$adi2") > > > > but > > > > OPTAB_CD (ftrunc_shrt_optab, "ftrunc$a$I$b2") > > > > or so? I know that gets somewhat awkward for the internal function, > > but IMHO we shouldn't tie our hands because of that? > I tried doing this originally, but indeed I couldn't find a way to correctly > tie the internal function to it. > > direct_optab_supported_p with multiple types expect those to be of the same > mode. I see convert_optab_supported_p does but I don't know how that is > used... > > Any ideas? No "nice" ones. The "usual" way is to provide fake arguments that specify the type/mode. We could use an integer argument directly secifying the mode (then the IL would look host dependent - ugh), or specify a constant zero in the intended mode (less visibly obvious - but at least with -gimple dumping you'd see the type...). In any case if people think going with two optabs is OK then please consider using ftruncsi and ftruncdi instead of 32/64. Richard.
Re: [AArch64] Enable generation of FRINTNZ instructions
On 12/11/2021 10:56, Richard Biener wrote: On Thu, 11 Nov 2021, Andre Vieira (lists) wrote: Hi, This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding optabs and mappings. It also creates a backend pattern to implement them for aarch64 and a match.pd pattern to idiom recognize these. These IFN's (and optabs) represent a truncation towards zero, as if performed by first casting it to a signed integer of 32 or 64 bits and then back to the same floating point type/mode. The match.pd pattern choses to use these, when supported, regardless of trapping math, since these new patterns mimic the original behavior of truncating through an integer. I didn't think any of the existing IFN's represented these. I know it's a bit late in stage 1, but I thought this might be OK given it's only used by a single target and should have very little impact on anything else. Bootstrapped on aarch64-none-linux. OK for trunk? On the RTL side ftrunc32/ftrunc64 would probably be better a conversion optab (with two modes), so not +OPTAB_D (ftrunc32_optab, "ftrunc$asi2") +OPTAB_D (ftrunc64_optab, "ftrunc$adi2") but OPTAB_CD (ftrunc_shrt_optab, "ftrunc$a$I$b2") or so? I know that gets somewhat awkward for the internal function, but IMHO we shouldn't tie our hands because of that? I tried doing this originally, but indeed I couldn't find a way to correctly tie the internal function to it. direct_optab_supported_p with multiple types expect those to be of the same mode. I see convert_optab_supported_p does but I don't know how that is used... Any ideas?
Re: [AArch64] Enable generation of FRINTNZ instructions
On Thu, 11 Nov 2021, Andre Vieira (lists) wrote: > Hi, > > This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding > optabs and mappings. It also creates a backend pattern to implement them for > aarch64 and a match.pd pattern to idiom recognize these. > These IFN's (and optabs) represent a truncation towards zero, as if performed > by first casting it to a signed integer of 32 or 64 bits and then back to the > same floating point type/mode. > > The match.pd pattern choses to use these, when supported, regardless of > trapping math, since these new patterns mimic the original behavior of > truncating through an integer. > > I didn't think any of the existing IFN's represented these. I know it's a bit > late in stage 1, but I thought this might be OK given it's only used by a > single target and should have very little impact on anything else. > > Bootstrapped on aarch64-none-linux. > > OK for trunk? On the RTL side ftrunc32/ftrunc64 would probably be better a conversion optab (with two modes), so not +OPTAB_D (ftrunc32_optab, "ftrunc$asi2") +OPTAB_D (ftrunc64_optab, "ftrunc$adi2") but OPTAB_CD (ftrunc_shrt_optab, "ftrunc$a$I$b2") or so? I know that gets somewhat awkward for the internal function, but IMHO we shouldn't tie our hands because of that? Richard. > gcc/ChangeLog: > > * config/aarch64/aarch64.md (ftrunc2): New > pattern. > * config/aarch64/iterators.md (FRINTZ): New iterator. > * doc/md.texi: New entry for ftrunc pattern name. > * internal-fn.def (FTRUNC32): New IFN. > (FTRUNC64): Likewise. > * match.pd: Add to the existing TRUNC pattern match. > * optabs.def (OPTAB_D): New entries for ftrunc. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz > instruction available. > * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target. > * gcc.target/aarch64/frintnz.c: New test. > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)
[AArch64] Enable generation of FRINTNZ instructions
Hi, This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding optabs and mappings. It also creates a backend pattern to implement them for aarch64 and a match.pd pattern to idiom recognize these. These IFN's (and optabs) represent a truncation towards zero, as if performed by first casting it to a signed integer of 32 or 64 bits and then back to the same floating point type/mode. The match.pd pattern choses to use these, when supported, regardless of trapping math, since these new patterns mimic the original behavior of truncating through an integer. I didn't think any of the existing IFN's represented these. I know it's a bit late in stage 1, but I thought this might be OK given it's only used by a single target and should have very little impact on anything else. Bootstrapped on aarch64-none-linux. OK for trunk? gcc/ChangeLog: * config/aarch64/aarch64.md (ftrunc2): New pattern. * config/aarch64/iterators.md (FRINTZ): New iterator. * doc/md.texi: New entry for ftrunc pattern name. * internal-fn.def (FTRUNC32): New IFN. (FTRUNC64): Likewise. * match.pd: Add to the existing TRUNC pattern match. * optabs.def (OPTAB_D): New entries for ftrunc. gcc/testsuite/ChangeLog: * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz instruction available. * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target. * gcc.target/aarch64/frintnz.c: New test. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 4035e061706793849c68ae09bcb2e4b9580ab7b6..ad4e04d7c874da095513442e7d7f247791d8921d 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -7345,6 +7345,16 @@ (define_insn "despeculate_simpleti" (set_attr "speculation_barrier" "true")] ) +(define_insn "ftrunc2" + [(set (match_operand:VSFDF 0 "register_operand" "=w") +(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")] + FRINTNZ))] + "TARGET_FRINT && TARGET_FLOAT + && !(VECTOR_MODE_P (mode) && !TARGET_SIMD)" + "\\t%0, %1" + [(set_attr "type" "f_rint")] +) + (define_insn "aarch64_" [(set (match_operand:VSFDF 0 "register_operand" "=w") (unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")] diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index bdc8ba3576cf2c9b4ae96b45a382234e4e25b13f..49510488a2a800689e95c399f2e6c967b566516d 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -3067,6 +3067,8 @@ (define_int_iterator FCMLA [UNSPEC_FCMLA (define_int_iterator FRINTNZX [UNSPEC_FRINT32Z UNSPEC_FRINT32X UNSPEC_FRINT64Z UNSPEC_FRINT64X]) +(define_int_iterator FRINTNZ [UNSPEC_FRINT32Z UNSPEC_FRINT64Z]) + (define_int_iterator SVE_BRK_UNARY [UNSPEC_BRKA UNSPEC_BRKB]) (define_int_iterator SVE_BRK_BINARY [UNSPEC_BRKN UNSPEC_BRKPA UNSPEC_BRKPB]) @@ -3482,6 +3484,8 @@ (define_int_attr f16mac1 [(UNSPEC_FMLAL "a") (UNSPEC_FMLSL "s") (define_int_attr frintnzs_op [(UNSPEC_FRINT32Z "frint32z") (UNSPEC_FRINT32X "frint32x") (UNSPEC_FRINT64Z "frint64z") (UNSPEC_FRINT64X "frint64x")]) +(define_int_attr frintnz_mode [(UNSPEC_FRINT32Z "si") (UNSPEC_FRINT64Z "di")]) + ;; The condition associated with an UNSPEC_COND_. (define_int_attr cmp_op [(UNSPEC_COND_CMPEQ_WIDE "eq") (UNSPEC_COND_CMPGE_WIDE "ge") diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 41f1850bf6e95005647ca97a495a97d7e184d137..7bd66818144e87e1dca2ef13bef1d6f21f239570 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -6175,6 +6175,13 @@ operands; otherwise, it may not. This pattern is not allowed to @code{FAIL}. +@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern +@item @samp{ftrunc@var{m}@var{n}2} +Truncate operand 1 to a @var{n} mode signed integer, towards zero, and store +the result in operand 0. Both operands have mode @var{m}, which is a scalar or +vector floating-point mode. + + @cindex @code{round@var{m}2} instruction pattern @item @samp{round@var{m}2} Round operand 1 to the nearest integer, rounding away from zero in the diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index bb13c6cce1bf55633760bc14980402f1f0ac1689..64263cbb83548b140f613cb4bf5ce6565373f96d 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -269,6 +269,8 @@ DEF_INTERNAL_FLT_FLOATN_FN (RINT, ECF_CONST, rint, unary) DEF_INTERNAL_FLT_FLOATN_FN (ROUND, ECF_CONST, round, unary) DEF_INTERNAL_FLT_FLOATN_FN (ROUNDEVEN, ECF_CONST, roundeven, unary) DEF_INTERNAL_FLT_FLOATN_FN (TRUNC, ECF_CONST, btrunc, unary) +DEF_INTERNAL_OPTAB_FN (FTRUNC32, ECF_CONST, ftrunc32, unary) +DEF_INTERNAL_OPTAB_FN (FTRUNC64, ECF_CONST, ftrunc64, unary) /* Binary math functions. */ DEF_INTERNAL_FLT_FN (ATAN2, ECF_CONST, atan2, binary) diff --git a/gcc/match.pd b/gcc/match.pd index a319aefa8081ac177981ad425c461f8a771128f4..