https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122586
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2026-01-14
Ever confirmed|0 |1
Summary|[16 Regression] 4-5% |[16 Regression] 4-5%
|slowdown of 538.imagick_r |slowdown of 538.imagick_r
|on Intel Ice Lake (3rd |on Intel Ice Lake (3rd
|generation Xeon) |generation Xeon) by
| |r16-4576-gfe9f0719d8ebd2
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed on a Zen4 machine. Reverting r16-4576-gfe9f0719d8ebd2 fixes it.
perf shows (GCC 15.2 vs. trunk):
Overhead Samples Command Shared Object
Symbol
25.52% 236921 imagick_r_peak. imagick_r_peak.gcc7-m64 [.]
GetVirtualPixelsFromNexus
24.58% 228166 imagick_r_base. imagick_r_base.gcc7-m64 [.]
GetVirtualPixelsFromNexus
16.56% 149491 imagick_r_peak. imagick_r_peak.gcc7-m64 [.]
MorphologyApply
16.44% 147132 imagick_r_base. imagick_r_base.gcc7-m64 [.]
MorphologyApply
7.51% 69689 imagick_r_peak. imagick_r_peak.gcc7-m64 [.]
MeanShiftImage
6.34% 58862 imagick_r_base. imagick_r_base.gcc7-m64 [.]
MeanShiftImage
0.66% 6104 imagick_r_peak. imagick_r_peak.gcc7-m64 [.]
GetOneCacheViewVirtualPixel
0.36% 3347 imagick_r_base. imagick_r_base.gcc7-m64 [.]
GetOneCacheViewVirtualPixel
where MeanShiftImage has
status=GetOneCacheViewVirtualPixel(pixel_view,(ssize_t)
MagickRound(mean_location.x+u),(ssize_t) MagickRound(
mean_location.y+v),&pixel,exception);
with
static inline double MagickRound(double x)
{
/*
Round the fraction to nearest integer.
*/
if ((x-floor(x)) < (ceil(x)-x))
return(floor(x));
return(ceil(x));
}
and code generated is old
│ vrndscalesd $0xa,%xmm0,%xmm0,%xmm3
▒
21 │ vrndscalesd $0x9,%xmm0,%xmm0,%xmm1
▒
2 │ vsubsd %xmm0,%xmm3,%xmm6
▒
306 │ vsubsd %xmm1,%xmm0,%xmm2
▒
714 │ vmovsd %xmm0,0xe0(%rsp)
▒
│1786 return(ceil(x));
▒
│ vcmpltsd %xmm6,%xmm2,%xmm2
▒
664 │ vblendvpd %xmm2,%xmm1,%xmm3,%xmm1
vs new
2 │ vaddsd _IO_stdin_used+0x13698,%xmm3,%xmm0
▒
15 │ vmovsd %xmm3,0xe8(%rsp)
▒
1269 │ vrndscalesd $0x9,%xmm0,%xmm0,%xmm0
it's not clear why the former is prefered. Possibly this is not the only
place the pattern triggers. GetVirtualPixelsFromNexus doesn't use floor
though.
Needs more analysis.