[PATCH] Introduce sh_mul and uh_mul RTX codes for high-part multiplications
This patch introduces new RTX codes to allow the RTL passes and backends to consistently represent high-part multiplications. Currently, the RTL used by different backends for expanding smul3_highpart and umul3_highpart varies greatly, with many but not all choosing to express this something like: (define_insn "smuldi3_highpart" [(set (match_operand:DI 0 "nvptx_register_operand" "=R") (truncate:DI (lshiftrt:TI (mult:TI (sign_extend:TI (match_operand:DI 1 "nvptx_register_operand" "R")) (sign_extend:TI (match_operand:DI 2 "nvptx_register_operand" "R"))) (const_int 64] "" "%.\\tmul.hi.s64\\t%0, %1, %2;") One complication with using this "widening multiplication" representation is that it requires an intermediate in a wider mode, making it difficult or impossible to encode a high-part multiplication of the widest supported integer mode. A second is that it can interfere with optimization; for example simplify-rtx.c contains the comment: case TRUNCATE: /* Don't optimize (lshiftrt (mult ...)) as it would interfere with the umulXi3_highpart patterns. */ Hopefully these problems are solved (or reduced) by introducing a new canonical form for high-part multiplications in RTL passes. This also simplifies insn patterns when one operand is constant. Whilst implementing some constant folding simplifications and compile-time evaluation of these new RTX codes, I noticed that this functionality could also be added for the existing saturating arithmetic RTX codes. Then likewise when documenting these new RTX codes, I also took the opportunity to silence the @xref warnings in invoke.texi. This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap" and "make -k check" with no new failures. Ok for mainline? 2021-09-25 Roger Sayle gcc/ChangeLog * gcc/rtl.def (SH_MULT, UH_MULT): New RTX codes for representing signed and unsigned high-part multiplication respectively. * gcc/simplify-rtx.c (simplify_binary_operation_1) [SH_MULT, UH_MULT]: Simplify high-part multiplications by zero. [SS_PLUS, US_PLUS, SS_MINUS, US_MINUS, SS_MULT, US_MULT, SS_DIV, US_DIV]: Similar simplifications for saturating arithmetic. (simplify_const_binary_operation) [SS_PLUS, US_PLUS, SS_MINUS, US_MINUS, SS_MULT, US_MULT, SH_MULT, UH_MULT]: Implement compile-time evaluation for constant operands. * gcc/dwarf2out.c (mem_loc_descriptor): Skip SH_MULT and UH_MULT. * doc/rtl.texi (sh_mult, uhmult): Document new RTX codes. * doc/md.texi (smul@var{m}3_highpart, umul@var{m3}_highpart): Mention the new sh_mul and uh_mul RTX codes. * doc/invoke.texi: Silence @xref "compilation" warnings. Roger -- diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 4acb941..2de7d99 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -3125,7 +3125,7 @@ errors if these functions are not inlined everywhere they are called. @itemx -fno-modules-ts @opindex fmodules-ts @opindex fno-modules-ts -Enable support for C++20 modules (@xref{C++ Modules}). The +Enable support for C++20 modules, see @xref{C++ Modules}. The @option{-fno-modules-ts} is usually not needed, as that is the default. Even though this is a C++20 feature, it is not currently implicitly enabled by selecting that standard version. @@ -33553,7 +33553,7 @@ version selected, although in pre-C++20 versions, it is of course an extension. No new source file suffixes are required or supported. If you wish to -use a non-standard suffix (@xref{Overall Options}), you also need +use a non-standard suffix, see @xref{Overall Options}, you also need to provide a @option{-x c++} option too.@footnote{Some users like to distinguish module interface files with a new suffix, such as naming the source @code{module.cppm}, which involves @@ -33615,8 +33615,8 @@ to be resolved at the end of compilation. Without this, imported macros are only resolved when expanded or (re)defined. This option detects conflicting import definitions for all macros. -@xref{C++ Module Mapper} for details of the @option{-fmodule-mapper} -family of options. +For details of the @option{-fmodule-mapper} family of options, +see @xref{C++ Module Mapper}. @menu * C++ Module Mapper:: Module Mapper @@ -33833,8 +33833,8 @@ dialect used and imports of the module.@footnote{The precise contents of this output may change.} The timestamp is the same value as that provided by the @code{__DATE__} & @code{__TIME__} macros, and may be explicitly specified with the environment variable -@code{SOURCE_DATE_EPOCH}. @xref{Environment Variables} for further -details. +@code{SOURCE_DATE_EPOCH}. For further details see +@xref{Environment Variables}. A set of related CMIs may be copied, provided the relative pathnames are preserved. diff --git a/gcc/doc/md.texi b/gcc/doc/m
Re: [PATCH] Introduce sh_mul and uh_mul RTX codes for high-part multiplications
On Sep 25 2021, Roger Sayle wrote: > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 4acb941..2de7d99 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -3125,7 +3125,7 @@ errors if these functions are not inlined everywhere > they are called. > @itemx -fno-modules-ts > @opindex fmodules-ts > @opindex fno-modules-ts > -Enable support for C++20 modules (@xref{C++ Modules}). The > +Enable support for C++20 modules, see @xref{C++ Modules}. The Or (@pxref{...}). Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Re: [PATCH] Introduce sh_mul and uh_mul RTX codes for high-part multiplications
"Roger Sayle" writes: > This patch introduces new RTX codes to allow the RTL passes and > backends to consistently represent high-part multiplications. > Currently, the RTL used by different backends for expanding > smul3_highpart and umul3_highpart varies greatly, > with many but not all choosing to express this something like: > > (define_insn "smuldi3_highpart" > [(set (match_operand:DI 0 "nvptx_register_operand" "=R") >(truncate:DI > (lshiftrt:TI > (mult:TI (sign_extend:TI >(match_operand:DI 1 "nvptx_register_operand" "R")) > (sign_extend:TI >(match_operand:DI 2 "nvptx_register_operand" "R"))) > (const_int 64] > "" > "%.\\tmul.hi.s64\\t%0, %1, %2;") > > One complication with using this "widening multiplication" representation > is that it requires an intermediate in a wider mode, making it difficult > or impossible to encode a high-part multiplication of the widest supported > integer mode. Yeah. It's also a problem when representing vector ops. > A second is that it can interfere with optimization; for > example simplify-rtx.c contains the comment: > >case TRUNCATE: > /* Don't optimize (lshiftrt (mult ...)) as it would interfere > with the umulXi3_highpart patterns. */ > > Hopefully these problems are solved (or reduced) by introducing a > new canonical form for high-part multiplications in RTL passes. > This also simplifies insn patterns when one operand is constant. > > Whilst implementing some constant folding simplifications and > compile-time evaluation of these new RTX codes, I noticed that > this functionality could also be added for the existing saturating > arithmetic RTX codes. Then likewise when documenting these new RTX > codes, I also took the opportunity to silence the @xref warnings in > invoke.texi. > > This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap" > and "make -k check" with no new failures. Ok for mainline? > > > 2021-09-25 Roger Sayle > > gcc/ChangeLog > * gcc/rtl.def (SH_MULT, UH_MULT): New RTX codes for representing > signed and unsigned high-part multiplication respectively. > * gcc/simplify-rtx.c (simplify_binary_operation_1) [SH_MULT, > UH_MULT]: Simplify high-part multiplications by zero. > [SS_PLUS, US_PLUS, SS_MINUS, US_MINUS, SS_MULT, US_MULT, > SS_DIV, US_DIV]: Similar simplifications for saturating > arithmetic. > (simplify_const_binary_operation) [SS_PLUS, US_PLUS, SS_MINUS, > US_MINUS, SS_MULT, US_MULT, SH_MULT, UH_MULT]: Implement > compile-time evaluation for constant operands. > * gcc/dwarf2out.c (mem_loc_descriptor): Skip SH_MULT and UH_MULT. > * doc/rtl.texi (sh_mult, uhmult): Document new RTX codes. > * doc/md.texi (smul@var{m}3_highpart, umul@var{m3}_highpart): > Mention the new sh_mul and uh_mul RTX codes. > * doc/invoke.texi: Silence @xref "compilation" warnings. Look like a good idea to me. Only real comment is on the naming: if possible, I think we should try to avoid introducing yet more differences between optab names and rtl codes. How about umul_highpart for the unsigned code, to match both the optab and the existing convention of adding “u” directly to the front of non-saturating operations? Things are more inconsistent for signed rtx codes: sometimes the “s” is present and sometimes it isn't. But since “smin” and “smax” have it, I think we can justify having it here too. So I think we should use smul_highpart and umul_highpart. It's a bit more wordy than sh_mul, but still a lot shorter than the status quo ;-) > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c > index ebad5cb..b4b04b9 100644 > --- a/gcc/simplify-rtx.c > +++ b/gcc/simplify-rtx.c > @@ -4142,11 +4142,40 @@ simplify_context::simplify_binary_operation_1 > (rtx_code code, > case US_PLUS: > case SS_MINUS: > case US_MINUS: > + /* Simplify x + 0 to x, if possible. */ Nit: +/- > + if (trueop1 == CONST0_RTX (mode) && !HONOR_SIGNED_ZEROS (mode)) The HONOR_SIGNED_ZEROS check is redundant, since these ops don't support modes with signed zero. Same for the other HONOR_* macros in the patch. E.g. I don't think we should try to guess how infinities and saturation work together. > + return op0; > + return 0; > + > case SS_MULT: > case US_MULT: > + /* Simplify x * 0 to 0, if possible. */ > + if (trueop1 == CONST0_RTX (mode) > + && !HONOR_NANS (mode) > + && !HONOR_SIGNED_ZEROS (mode) > + && !side_effects_p (op0)) > + return op1; > + > + /* Simplify x * 1 to x, if possible. */ > + if (trueop1 == CONST1_RTX (mode) && !HONOR_SNANS (mode)) > + return op0; > + return 0; > + > +case SH_MULT: > +case UH_MULT: > + /* Simplify x * 0 to 0, if possible. */ > + if (trueop1 == CONST0_RTX (mode) > + && !HONOR_NANS (mode) > + && !HONOR_