Re: [PATCH][arm][1/X] Add initial support for saturation intrinsics
Hi Christophe, On 11/12/19 10:29 AM, Christophe Lyon wrote: On Thu, 7 Nov 2019 at 11:26, Kyrill Tkachov wrote: Hi all, This patch adds the plumbing for and an implementation of the saturation intrinsics from ACLE [1], in particular the __ssat, __usat intrinsics. These intrinsics set the Q sticky bit in APSR if an overflow occurred. ACLE allows the user to read that bit (within the same function, it's not defined across function boundaries) using the __saturation_occurred intrinsic and reset it using __set_saturation_occurred. Thus, if the user cares about the Q bit they would be using a flow such as: __set_saturation_occurred (0); // reset the Q bit ... __ssat (...) // Do some calculations involving __ssat ... if (__saturation_occurred ()) // if Q bit set handle overflow ... For the implementation this has a few implications: * We must track the Q-setting side-effects of these instructions to make sure saturation reading/writing intrinsics are ordered properly. This is done by introducing a new "apsrq" register (and associated APSRQ_REGNUM) in a similar way to the "fake"" cc register. * The RTL patterns coming out of these intrinsics can have two forms: one where they set the APSRQ_REGNUM and one where they don't. Which one is used depends on whether the function cares about reading the Q flag. This is detected using the TARGET_CHECK_BUILTIN_CALL hook on the __saturation_occurred, __set_saturation_occurred occurrences. If no Q-flag read is present in the function we'll use the simpler non-Q-setting form to allow for more aggressive scheduling and such. If a Q-bit read is present then the Q-setting form is emitted. To avoid adding two patterns for each intrinsic to the MD file we make use of define_subst to auto-generate the Q-setting forms * Some existing patterns already produce instructions that may clobber the Q bit, but they don't model it (as we didn't care about that bit up till now). Since these patterns can be generated from straight-line C code they can affect the Q-bit reads from intrinsics. Therefore they have to be disabled when a Q-bit read is present. These are mostly patterns in arm-fixed.md that are not very common anyway, but there are also a couple of widening multiply-accumulate patterns in arm.md that can set the Q-bit during accumulation. There are more Q-setting intrinsics in ACLE, but these will be implemented in a more mechanical fashion once the infrastructure in this patch goes in. Bootstrapped and tested on arm-none-linux-gnueabihf. Committing to trunk. Thanks, Kyrill 2019-11-07 Kyrylo Tkachov * config/arm/aout.h (REGISTER_NAMES): Add apsrq. * config/arm/arm.md (APSRQ_REGNUM): Define. (add_setq): New define_subst. (add_clobber_q_name): New define_subst_attr. (add_clobber_q_pred): Likewise. (maddhisi4): Change to define_expand. Split into mult and add if ARM_Q_BIT_READ. (arm_maddhisi4): New define_insn. (*maddhisi4tb): Disable for ARM_Q_BIT_READ. (*maddhisi4tt): Likewise. (arm_ssat): New define_expand. (arm_usat): Likewise. (arm_get_apsr): New define_insn. (arm_set_apsr): Likewise. (arm_saturation_occurred): New define_expand. (arm_set_saturation): Likewise. (*satsi_): Rename to... (satsi_): ... This. (*satsi__shift): Disable for ARM_Q_BIT_READ. * config/arm/arm.h (FIXED_REGISTERS): Mark apsrq as fixed. (CALL_USED_REGISTERS): Mark apsrq. (FIRST_PSEUDO_REGISTER): Update value. (REG_ALLOC_ORDER): Add APSRQ_REGNUM. (machine_function): Add q_bit_access. (ARM_Q_BIT_READ): Define. * config/arm/arm.c (TARGET_CHECK_BUILTIN_CALL): Define. (arm_conditional_register_usage): Clear APSRQ_REGNUM from operand_reg_set. (arm_q_bit_access): Define. * config/arm/arm-builtins.c: Include stringpool.h. (arm_sat_binop_imm_qualifiers, arm_unsigned_sat_binop_unsigned_imm_qualifiers, arm_sat_occurred_qualifiers, arm_set_sat_qualifiers): Define. (SAT_BINOP_UNSIGNED_IMM_QUALIFIERS, UNSIGNED_SAT_BINOP_UNSIGNED_IMM_QUALIFIERS, SAT_OCCURRED_QUALIFIERS, SET_SAT_QUALIFIERS): Likewise. (arm_builtins): Define ARM_BUILTIN_SAT_IMM_CHECK. (arm_init_acle_builtins): Initialize __builtin_sat_imm_check. Handle 0 argument expander. (arm_expand_acle_builtin): Handle ARM_BUILTIN_SAT_IMM_CHECK. (arm_check_builtin_call): Define. * config/arm/arm.md (ssmulsa3, usmulusa3, usmuluha3, arm_ssatsihi_shift, arm_usatsihi): Disable when ARM_Q_BIT_READ. * config/arm/arm-protos.h (arm_check_builtin_call): Declare prototype. (arm_q_bit_access): Likewise. * config/arm/arm_acle.h (__ssat, __usat, __ignore_saturation, __saturation_occurred, __set_saturation_occurred): Define. * config/arm/arm_acle_builtins.def: Define builtins for ssat, usat, saturation_occurred, set_saturation_occurred. * config/arm/unspecs.md (UNSPEC_Q_S
Re: [PATCH][arm][1/X] Add initial support for saturation intrinsics
On Thu, 7 Nov 2019 at 11:26, Kyrill Tkachov wrote: > > Hi all, > > This patch adds the plumbing for and an implementation of the saturation > intrinsics from ACLE [1], in particular the __ssat, __usat intrinsics. > These intrinsics set the Q sticky bit in APSR if an overflow occurred. > ACLE allows the user to read that bit (within the same function, it's not > defined across function boundaries) using the __saturation_occurred > intrinsic > and reset it using __set_saturation_occurred. > Thus, if the user cares about the Q bit they would be using a flow such as: > > __set_saturation_occurred (0); // reset the Q bit > ... > __ssat (...) // Do some calculations involving __ssat > ... > if (__saturation_occurred ()) // if Q bit set handle overflow >... > > For the implementation this has a few implications: > * We must track the Q-setting side-effects of these instructions to make > sure > saturation reading/writing intrinsics are ordered properly. > This is done by introducing a new "apsrq" register (and associated > APSRQ_REGNUM) in a similar way to the "fake"" cc register. > > * The RTL patterns coming out of these intrinsics can have two forms: > one where they set the APSRQ_REGNUM and one where they don't. > Which one is used depends on whether the function cares about reading the Q > flag. This is detected using the TARGET_CHECK_BUILTIN_CALL hook on the > __saturation_occurred, __set_saturation_occurred occurrences. > If no Q-flag read is present in the function we'll use the simpler > non-Q-setting form to allow for more aggressive scheduling and such. > If a Q-bit read is present then the Q-setting form is emitted. > To avoid adding two patterns for each intrinsic to the MD file we make > use of define_subst to auto-generate the Q-setting forms > > * Some existing patterns already produce instructions that may clobber the > Q bit, but they don't model it (as we didn't care about that bit up till > now). > Since these patterns can be generated from straight-line C code they can > affect > the Q-bit reads from intrinsics. Therefore they have to be disabled when > a Q-bit read is present. These are mostly patterns in arm-fixed.md that are > not very common anyway, but there are also a couple of widening > multiply-accumulate patterns in arm.md that can set the Q-bit during > accumulation. > > There are more Q-setting intrinsics in ACLE, but these will be > implemented in > a more mechanical fashion once the infrastructure in this patch goes in. > > Bootstrapped and tested on arm-none-linux-gnueabihf. > > Committing to trunk. > Thanks, > Kyrill > > > 2019-11-07 Kyrylo Tkachov > > * config/arm/aout.h (REGISTER_NAMES): Add apsrq. > * config/arm/arm.md (APSRQ_REGNUM): Define. > (add_setq): New define_subst. > (add_clobber_q_name): New define_subst_attr. > (add_clobber_q_pred): Likewise. > (maddhisi4): Change to define_expand. Split into mult and add if > ARM_Q_BIT_READ. > (arm_maddhisi4): New define_insn. > (*maddhisi4tb): Disable for ARM_Q_BIT_READ. > (*maddhisi4tt): Likewise. > (arm_ssat): New define_expand. > (arm_usat): Likewise. > (arm_get_apsr): New define_insn. > (arm_set_apsr): Likewise. > (arm_saturation_occurred): New define_expand. > (arm_set_saturation): Likewise. > (*satsi_): Rename to... > (satsi_): ... This. > (*satsi__shift): Disable for ARM_Q_BIT_READ. > * config/arm/arm.h (FIXED_REGISTERS): Mark apsrq as fixed. > (CALL_USED_REGISTERS): Mark apsrq. > (FIRST_PSEUDO_REGISTER): Update value. > (REG_ALLOC_ORDER): Add APSRQ_REGNUM. > (machine_function): Add q_bit_access. > (ARM_Q_BIT_READ): Define. > * config/arm/arm.c (TARGET_CHECK_BUILTIN_CALL): Define. > (arm_conditional_register_usage): Clear APSRQ_REGNUM from > operand_reg_set. > (arm_q_bit_access): Define. > * config/arm/arm-builtins.c: Include stringpool.h. > (arm_sat_binop_imm_qualifiers, > arm_unsigned_sat_binop_unsigned_imm_qualifiers, > arm_sat_occurred_qualifiers, arm_set_sat_qualifiers): Define. > (SAT_BINOP_UNSIGNED_IMM_QUALIFIERS, > UNSIGNED_SAT_BINOP_UNSIGNED_IMM_QUALIFIERS, SAT_OCCURRED_QUALIFIERS, > SET_SAT_QUALIFIERS): Likewise. > (arm_builtins): Define ARM_BUILTIN_SAT_IMM_CHECK. > (arm_init_acle_builtins): Initialize __builtin_sat_imm_check. > Handle 0 argument expander. > (arm_expand_acle_builtin): Handle ARM_BUILTIN_SAT_IMM_CHECK. > (arm_check_builtin_call): Define. > * config/arm/arm.md (ssmulsa3, usmulusa3, usmuluha3, > arm_ssatsihi_shift, arm_usatsihi): Disable when ARM_Q_BIT_READ. > * config/arm/arm-protos.h (arm_check_builtin_call): Declare prototype. > (arm_q_bit_access): Likewise. > * config/arm/arm_acle.h (__ssat, __usat, __ignore_saturation, > __saturation_occurred, __set_saturation_occurred): Define. > * config/arm/arm_acle_builtins.def: Define builtins for ssat, usat,
Re: [PATCH][arm][1/X] Add initial support for saturation intrinsics
Hi Richard, On 11/9/19 12:44 PM, Richard Henderson wrote: On 11/7/19 11:26 AM, Kyrill Tkachov wrote: -;; The code sequence emitted by this insn pattern uses the Q flag, which GCC -;; doesn't generally know about, so we don't bother expanding to individual -;; instructions. It may be better to just use an out-of-line asm libcall for -;; this. +;; The code sequence emitted by this insn pattern uses the Q flag, so we need +;; to bail out when ARM_Q_BIT_READ and resort to a library sequence instead. + +(define_expand "ssmulsa3" + [(parallel [(set (match_operand:SA 0 "s_register_operand") + (ss_mult:SA (match_operand:SA 1 "s_register_operand") + (match_operand:SA 2 "s_register_operand"))) + (clobber (match_scratch:DI 3)) + (clobber (match_scratch:SI 4)) + (clobber (reg:CC CC_REGNUM))])] + "TARGET_32BIT && arm_arch6" + { +if (ARM_Q_BIT_READ) + FAIL; + } +) Coming back to this, why would you not just represent the update of the Q bit? This is not generated by generic pattern matching, but by the __ssmulsa3 builtin function. It seems easy to me to simply describe how this older builtin operates in conjunction with the new acle builtins. I recognize that ssadd3 etc are more difficult, because they can be generated by arithmetic operations on TYPE_SATURATING. Although again it seems weird to generate expensive out-of-line code for TYPE_SATURATING when used in conjunction with acle builtins. I think it would be better to merely expand the documentation. Even if only so far as to say "unsupported to mix these". I'm tempted to agree, as this part of the patch is quite ugly. Thank you for the comments on these patches, I wasn't aware of some of the mechanisms. I guess I should have posted the series as an RFC first... I'll send patches to fix up the issues. Thanks, Kyrill +(define_expand "maddhisi4" + [(set (match_operand:SI 0 "s_register_operand") + (plus:SI (mult:SI (sign_extend:SI + (match_operand:HI 1 "s_register_operand")) + (sign_extend:SI + (match_operand:HI 2 "s_register_operand"))) +(match_operand:SI 3 "s_register_operand")))] + "TARGET_DSP_MULTIPLY" + { +/* If this function reads the Q bit from ACLE intrinsics break up the + multiplication and accumulation as an overflow during accumulation will + clobber the Q flag. */ +if (ARM_Q_BIT_READ) + { + rtx tmp = gen_reg_rtx (SImode); + emit_insn (gen_mulhisi3 (tmp, operands[1], operands[2])); + emit_insn (gen_addsi3 (operands[0], tmp, operands[3])); + DONE; + } + } +) + +(define_insn "*arm_maddhisi4" [(set (match_operand:SI 0 "s_register_operand" "=r") (plus:SI (mult:SI (sign_extend:SI (match_operand:HI 1 "s_register_operand" "r")) (sign_extend:SI (match_operand:HI 2 "s_register_operand" "r"))) (match_operand:SI 3 "s_register_operand" "r")))] - "TARGET_DSP_MULTIPLY" + "TARGET_DSP_MULTIPLY && !ARM_Q_BIT_READ" "smlabb%?\\t%0, %1, %2, %3" [(set_attr "type" "smlaxy") (set_attr "predicable" "yes")] I think this case would be better represented with a single define_insn_and_split and a peephole2. It is easy to notice during peep2 whether or not the Q bit is actually live at the exact place we want to expand this operation. If it is live, then use two insns; if it isn't, use one. r~
Re: [PATCH][arm][1/X] Add initial support for saturation intrinsics
> +;; define_subst and associated attributes > + > +(define_subst "add_setq" > + [(set (match_operand:SI 0 "" "") > +(match_operand:SI 1 "" ""))] > + "" > + [(set (match_dup 0) > +(match_dup 1)) > + (set (reg:CC APSRQ_REGNUM) > + (unspec:CC [(reg:CC APSRQ_REGNUM)] UNSPEC_Q_SET))]) > + > +(define_subst_attr "add_clobber_q_name" "add_setq" "" "_setq") > +(define_subst_attr "add_clobber_q_pred" "add_setq" "!ARM_Q_BIT_READ" > +"ARM_Q_BIT_READ") Is there a good reason to use CCmode for the Q bit instead of BImode? Is there a good reason not to represent the clobber of the Q bit when we're not representing the set? Although it probably doesn't matter, because of the unspec, the update of the Q bit here in the subst isn't right. Better would be (set (reg:BI APSRQ_REGNUM) (ior:BI (unspec:BI [(match_dup 1)] UNSPEC_Q_SET) (reg:BI APSRQ_REGNUM))) > +/* Implement TARGET_CHECK_BUILTIN_CALL. Record a read of the Q bit through > + intrinsics in the machine function. */ > +bool > +arm_check_builtin_call (location_t , vec , tree fndecl, > + tree, unsigned int, tree *) > +{ > + int fcode = DECL_MD_FUNCTION_CODE (fndecl); > + if (fcode == ARM_BUILTIN_saturation_occurred > + || fcode == ARM_BUILTIN_set_saturation) > +{ > + if (cfun && cfun->decl) > + DECL_ATTRIBUTES (cfun->decl) > + = tree_cons (get_identifier ("acle qbit"), NULL_TREE, > +DECL_ATTRIBUTES (cfun->decl)); > +} > + return true; > +} Where does this attribute affect inlining? Or get merged into the signature of a calling function during inlining? This check happens way way early in the c front end, while doing semantic checks. I think this is the wrong hook to use, especially since this is not a semantic check of the arguments to the builtin. I think you want to record the use of such a builtin during rtl expansion, within the define_expand for the buitin, setting a bool in struct machine_function. Then use that bool in the arm.md predicates. (I'll note that there are 5 "int" fields in the arm machine_function that should be bool, now that we're not using C89. Probably best to re-order all of them to the end and add your new bool next to them.) > +(define_insn "arm_set_apsr" > + [(set (reg:CC APSRQ_REGNUM) > + (unspec_volatile:CC > + [(match_operand:SI 0 "s_register_operand" "r")] VUNSPEC_APSR_WRITE))] > + "TARGET_ARM_QBIT" > + "msr%?\tAPSR_nzcvq, %0" > + [(set_attr "predicable" "yes") > + (set_attr "conds" "set")] > +) This is missing a clobber (or set) of CC_REGNUM. Why unspec_volatile and not unspec? > +;; Read the APSR and set the Q bit (bit position 27) according to operand 0 > +(define_expand "arm_set_saturation" > + [(match_operand:SI 0 "reg_or_int_operand")] > + "TARGET_ARM_QBIT" > + { > +rtx apsr = gen_reg_rtx (SImode); > +emit_insn (gen_arm_get_apsr (apsr)); > +rtx to_insert = gen_reg_rtx (SImode); > +if (CONST_INT_P (operands[0])) > + emit_move_insn (to_insert, operands[0] == CONST0_RTX (SImode) > + ? CONST0_RTX (SImode) : CONST1_RTX (SImode)); > +else > + { > +rtx cmp = gen_rtx_NE (SImode, operands[0], CONST0_RTX (SImode)); > +emit_insn (gen_cstoresi4 (to_insert, cmp, operands[0], > + CONST0_RTX (SImode))); > + } > +emit_insn (gen_insv (apsr, CONST1_RTX (SImode), > +gen_int_mode (27, SImode), to_insert)); > +emit_insn (gen_arm_set_apsr (apsr)); > +DONE; > + } > +) Why are you preserving APSR.NZCV across this operation? It should not be live during the builtin that expands this. r~
[PATCH][arm][1/X] Add initial support for saturation intrinsics
Hi all, This patch adds the plumbing for and an implementation of the saturation intrinsics from ACLE [1], in particular the __ssat, __usat intrinsics. These intrinsics set the Q sticky bit in APSR if an overflow occurred. ACLE allows the user to read that bit (within the same function, it's not defined across function boundaries) using the __saturation_occurred intrinsic and reset it using __set_saturation_occurred. Thus, if the user cares about the Q bit they would be using a flow such as: __set_saturation_occurred (0); // reset the Q bit ... __ssat (...) // Do some calculations involving __ssat ... if (__saturation_occurred ()) // if Q bit set handle overflow ... For the implementation this has a few implications: * We must track the Q-setting side-effects of these instructions to make sure saturation reading/writing intrinsics are ordered properly. This is done by introducing a new "apsrq" register (and associated APSRQ_REGNUM) in a similar way to the "fake"" cc register. * The RTL patterns coming out of these intrinsics can have two forms: one where they set the APSRQ_REGNUM and one where they don't. Which one is used depends on whether the function cares about reading the Q flag. This is detected using the TARGET_CHECK_BUILTIN_CALL hook on the __saturation_occurred, __set_saturation_occurred occurrences. If no Q-flag read is present in the function we'll use the simpler non-Q-setting form to allow for more aggressive scheduling and such. If a Q-bit read is present then the Q-setting form is emitted. To avoid adding two patterns for each intrinsic to the MD file we make use of define_subst to auto-generate the Q-setting forms * Some existing patterns already produce instructions that may clobber the Q bit, but they don't model it (as we didn't care about that bit up till now). Since these patterns can be generated from straight-line C code they can affect the Q-bit reads from intrinsics. Therefore they have to be disabled when a Q-bit read is present. These are mostly patterns in arm-fixed.md that are not very common anyway, but there are also a couple of widening multiply-accumulate patterns in arm.md that can set the Q-bit during accumulation. There are more Q-setting intrinsics in ACLE, but these will be implemented in a more mechanical fashion once the infrastructure in this patch goes in. Bootstrapped and tested on arm-none-linux-gnueabihf. Committing to trunk. Thanks, Kyrill 2019-11-07 Kyrylo Tkachov * config/arm/aout.h (REGISTER_NAMES): Add apsrq. * config/arm/arm.md (APSRQ_REGNUM): Define. (add_setq): New define_subst. (add_clobber_q_name): New define_subst_attr. (add_clobber_q_pred): Likewise. (maddhisi4): Change to define_expand. Split into mult and add if ARM_Q_BIT_READ. (arm_maddhisi4): New define_insn. (*maddhisi4tb): Disable for ARM_Q_BIT_READ. (*maddhisi4tt): Likewise. (arm_ssat): New define_expand. (arm_usat): Likewise. (arm_get_apsr): New define_insn. (arm_set_apsr): Likewise. (arm_saturation_occurred): New define_expand. (arm_set_saturation): Likewise. (*satsi_): Rename to... (satsi_): ... This. (*satsi__shift): Disable for ARM_Q_BIT_READ. * config/arm/arm.h (FIXED_REGISTERS): Mark apsrq as fixed. (CALL_USED_REGISTERS): Mark apsrq. (FIRST_PSEUDO_REGISTER): Update value. (REG_ALLOC_ORDER): Add APSRQ_REGNUM. (machine_function): Add q_bit_access. (ARM_Q_BIT_READ): Define. * config/arm/arm.c (TARGET_CHECK_BUILTIN_CALL): Define. (arm_conditional_register_usage): Clear APSRQ_REGNUM from operand_reg_set. (arm_q_bit_access): Define. * config/arm/arm-builtins.c: Include stringpool.h. (arm_sat_binop_imm_qualifiers, arm_unsigned_sat_binop_unsigned_imm_qualifiers, arm_sat_occurred_qualifiers, arm_set_sat_qualifiers): Define. (SAT_BINOP_UNSIGNED_IMM_QUALIFIERS, UNSIGNED_SAT_BINOP_UNSIGNED_IMM_QUALIFIERS, SAT_OCCURRED_QUALIFIERS, SET_SAT_QUALIFIERS): Likewise. (arm_builtins): Define ARM_BUILTIN_SAT_IMM_CHECK. (arm_init_acle_builtins): Initialize __builtin_sat_imm_check. Handle 0 argument expander. (arm_expand_acle_builtin): Handle ARM_BUILTIN_SAT_IMM_CHECK. (arm_check_builtin_call): Define. * config/arm/arm.md (ssmulsa3, usmulusa3, usmuluha3, arm_ssatsihi_shift, arm_usatsihi): Disable when ARM_Q_BIT_READ. * config/arm/arm-protos.h (arm_check_builtin_call): Declare prototype. (arm_q_bit_access): Likewise. * config/arm/arm_acle.h (__ssat, __usat, __ignore_saturation, __saturation_occurred, __set_saturation_occurred): Define. * config/arm/arm_acle_builtins.def: Define builtins for ssat, usat, saturation_occurred, set_saturation_occurred. * config/arm/unspecs.md (UNSPEC_Q_SET): Define. (UNSPEC_APSR_READ): Likewise. (VUNSPEC_APSR_WRITE): Likewise. * config/arm/arm-fixed.md (ssadd3): Convert to define_expand. (*arm_ssadd3): New define_insn. (sssub3): Convert t