https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119900
Filip Kastl <pheeck at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|[16 regression] imagick |[16 regression] imagick
|slowdown with -Ofast |slowdown with -Ofast
|-march=native -fprofile-use |-march=native -fprofile-use
|between g:b986ed16c2546674 |since
|and g:e1098c7b08d9e601 |r16-39-gf6859fb621179e
--- Comment #1 from Filip Kastl <pheeck at gcc dot gnu.org> ---
Bisected to r16-39-gf6859fb621179e
commit f6859fb621179ec9bf5631eb8902619ab8d4467b
Author: Jan Hubicka <[email protected]>
Date: Sat Apr 19 18:51:27 2025 +0200
Add tables for SSE fp conversion costs
as disucssed, I will proceed adding costs for common SSE operations which
are
currently globbed into addss cost, so we do not need to set it incorrectly
for
znver5. Looking through the stats, there are quite few missing cases, so I
am
starting with those that I think are more common. I plan to do it in
smaller
steps so individual changes gets benchmarked by LNT and also can be
bisected
to.
This patch adds costs for various SSE and AVX FP->FP conversions
(extensions and
truncations). Looking through Agner Fog's tables, these are bit assymetric
so I
added cost for CVTSS2SD which is also used for CVTSD2SS, CVTPS2PD and
CVTPD2PS,
cost for 256bit VCVTPS2PS (also used for oposite direction) and cost for
512bit
one.
I plan to add int->int conversions next and then int->fp & fp->int which
are
more tricky since they may bundle inter-unit move.
I also noticed that size tables are wrong for all SSE instructions so I
updated
them. With some love I think vectorization can work as size optimization,
too,
but we need more work on that.
Those values I can find in Agner Fog tables are taken from there, other are
guesses
(especially for yongfeng_cost and shijidadao_cost).
gcc/ChangeLog:
* config/i386/i386.cc (vec_fp_conversion_cost): New function.
(ix86_rtx_costs): Use it for SSE/AVX FP conversoins.
(ix86_builtin_vectorization_cost): Fix indentation;
and use vec_fp_conversion_cost in vec_promote_demote.
(fp_conversion_stmt_cost): New function.
(ix86_vector_costs::add_stmt_cost): Use it to cost NOP_EXPR
and vec_promote_demote.
* config/i386/i386.h (struct processor_costs):
* config/i386/x86-tune-costs.h (struct processor_costs):