[Bug target/119900] [16 regression] imagick slowdown with -Ofast -march=native -fprofile-use since r16-39-gf6859fb621179e

pheeck at gcc dot gnu.org via Gcc-bugs Thu, 24 Apr 2025 04:38:02 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119900


Filip Kastl <pheeck at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[16 regression] imagick     |[16 regression] imagick
                   |slowdown with -Ofast        |slowdown with -Ofast
                   |-march=native -fprofile-use |-march=native -fprofile-use
                   |between g:b986ed16c2546674  |since
                   |and g:e1098c7b08d9e601      |r16-39-gf6859fb621179e

--- Comment #1 from Filip Kastl <pheeck at gcc dot gnu.org> ---
Bisected to r16-39-gf6859fb621179e

commit f6859fb621179ec9bf5631eb8902619ab8d4467b
Author: Jan Hubicka <[email protected]>
Date:   Sat Apr 19 18:51:27 2025 +0200

    Add tables for SSE fp conversion costs

    as disucssed, I will proceed adding costs for common SSE operations which
are
    currently globbed into addss cost, so we do not need to set it incorrectly
for
    znver5.  Looking through the stats, there are quite few missing cases, so I
am
    starting with those that I think are more common. I plan to do it in
smaller
    steps so individual changes gets benchmarked by LNT and also can be
bisected
    to.

    This patch adds costs for various SSE and AVX FP->FP conversions
(extensions and
    truncations). Looking through Agner Fog's tables, these are bit assymetric
so I
    added cost for CVTSS2SD which is also used for CVTSD2SS, CVTPS2PD and
CVTPD2PS,
    cost for 256bit VCVTPS2PS (also used for oposite direction) and cost for
512bit
    one.

    I plan to add int->int conversions next and then int->fp & fp->int which
are
    more tricky since they may bundle inter-unit move.

    I also noticed that size tables are wrong for all SSE instructions so I
updated
    them.  With some love I think vectorization can work as size optimization,
too,
    but we need more work on that.

    Those values I can find in Agner Fog tables are taken from there, other are
guesses
    (especially for yongfeng_cost and shijidadao_cost).

    gcc/ChangeLog:

            * config/i386/i386.cc (vec_fp_conversion_cost): New function.
            (ix86_rtx_costs): Use it for SSE/AVX FP conversoins.
            (ix86_builtin_vectorization_cost): Fix indentation;
            and use vec_fp_conversion_cost in vec_promote_demote.
            (fp_conversion_stmt_cost): New function.
            (ix86_vector_costs::add_stmt_cost): Use it to cost NOP_EXPR
            and vec_promote_demote.
            * config/i386/i386.h (struct processor_costs):
            * config/i386/x86-tune-costs.h (struct processor_costs):

[Bug target/119900] [16 regression] imagick slowdown with -Ofast -march=native -fprofile-use since r16-39-gf6859fb621179e

Reply via email to