https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832

--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Uros Bizjak <u...@gcc.gnu.org>:

https://gcc.gnu.org/g:ad5b757d99b5a121198b79a6a42c1f15ae86a190

commit r14-3085-gad5b757d99b5a121198b79a6a42c1f15ae86a190
Author: Uros Bizjak <ubiz...@gmail.com>
Date:   Tue Aug 8 18:53:51 2023 +0200

    i386: Do not sanitize upper part of V2SFmode reg with -fno-trapping-math
[PR110832]

    Also introduce -m[no-]partial-vector-fp-math option to disable trapping
    V2SF named patterns in order to avoid generation of partial vector V4SFmode
    trapping instructions.

    The new option is enabled by default, because even with sanitization,
    a small but consistent speed up of 2 to 3% with Polyhedron capacita
    benchmark can be achieved vs. scalar code.

    Using -fno-trapping-math improves Polyhedron capacita runtime 8 to 9%
    vs. scalar code.  This is what clang does by default, as it defaults
    to -fno-trapping-math.

            PR target/110832

    gcc/ChangeLog:

            * config/i386/i386.opt (mpartial-vector-fp-math): New option.
            * config/i386/mmx.md (movq_<mode>_to_sse): Do not sanitize
            upper part of V2SFmode register with -fno-trapping-math.
            (<plusminusmult:insn>v2sf3): Enable for ix86_partial_vec_fp_math.
            (divv2sf3): Ditto.
            (<smaxmin:code>v2sf3): Ditto.
            (sqrtv2sf2): Ditto.
            (*mmx_haddv2sf3_low): Ditto.
            (*mmx_hsubv2sf3_low): Ditto.
            (vec_addsubv2sf3): Ditto.
            (vec_cmpv2sfv2si): Ditto.
            (vcond<V2FI:mode>v2sf): Ditto.
            (fmav2sf4): Ditto.
            (fmsv2sf4): Ditto.
            (fnmav2sf4): Ditto.
            (fnmsv2sf4): Ditto.
            (fix_truncv2sfv2si2): Ditto.
            (fixuns_truncv2sfv2si2): Ditto.
            (floatv2siv2sf2): Ditto.
            (floatunsv2siv2sf2): Ditto.
            (nearbyintv2sf2): Ditto.
            (rintv2sf2): Ditto.
            (lrintv2sfv2si2): Ditto.
            (ceilv2sf2): Ditto.
            (lceilv2sfv2si2): Ditto.
            (floorv2sf2): Ditto.
            (lfloorv2sfv2si2): Ditto.
            (btruncv2sf2): Ditto.
            (roundv2sf2): Ditto.
            (lroundv2sfv2si2): Ditto.
            * doc/invoke.texi (x86 Options): Document
            -mpartial-vector-fp-math option.

    gcc/testsuite/ChangeLog:

            * gcc.target/i386/pr110832-1.c: New test.
            * gcc.target/i386/pr110832-2.c: New test.
            * gcc.target/i386/pr110832-3.c: New test.

Reply via email to