https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832
--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Uros Bizjak <u...@gcc.gnu.org>: https://gcc.gnu.org/g:ad5b757d99b5a121198b79a6a42c1f15ae86a190 commit r14-3085-gad5b757d99b5a121198b79a6a42c1f15ae86a190 Author: Uros Bizjak <ubiz...@gmail.com> Date: Tue Aug 8 18:53:51 2023 +0200 i386: Do not sanitize upper part of V2SFmode reg with -fno-trapping-math [PR110832] Also introduce -m[no-]partial-vector-fp-math option to disable trapping V2SF named patterns in order to avoid generation of partial vector V4SFmode trapping instructions. The new option is enabled by default, because even with sanitization, a small but consistent speed up of 2 to 3% with Polyhedron capacita benchmark can be achieved vs. scalar code. Using -fno-trapping-math improves Polyhedron capacita runtime 8 to 9% vs. scalar code. This is what clang does by default, as it defaults to -fno-trapping-math. PR target/110832 gcc/ChangeLog: * config/i386/i386.opt (mpartial-vector-fp-math): New option. * config/i386/mmx.md (movq_<mode>_to_sse): Do not sanitize upper part of V2SFmode register with -fno-trapping-math. (<plusminusmult:insn>v2sf3): Enable for ix86_partial_vec_fp_math. (divv2sf3): Ditto. (<smaxmin:code>v2sf3): Ditto. (sqrtv2sf2): Ditto. (*mmx_haddv2sf3_low): Ditto. (*mmx_hsubv2sf3_low): Ditto. (vec_addsubv2sf3): Ditto. (vec_cmpv2sfv2si): Ditto. (vcond<V2FI:mode>v2sf): Ditto. (fmav2sf4): Ditto. (fmsv2sf4): Ditto. (fnmav2sf4): Ditto. (fnmsv2sf4): Ditto. (fix_truncv2sfv2si2): Ditto. (fixuns_truncv2sfv2si2): Ditto. (floatv2siv2sf2): Ditto. (floatunsv2siv2sf2): Ditto. (nearbyintv2sf2): Ditto. (rintv2sf2): Ditto. (lrintv2sfv2si2): Ditto. (ceilv2sf2): Ditto. (lceilv2sfv2si2): Ditto. (floorv2sf2): Ditto. (lfloorv2sfv2si2): Ditto. (btruncv2sf2): Ditto. (roundv2sf2): Ditto. (lroundv2sfv2si2): Ditto. * doc/invoke.texi (x86 Options): Document -mpartial-vector-fp-math option. gcc/testsuite/ChangeLog: * gcc.target/i386/pr110832-1.c: New test. * gcc.target/i386/pr110832-2.c: New test. * gcc.target/i386/pr110832-3.c: New test.