https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103252

--- Comment #9 from Jason A. Donenfeld <jason at zx2c4 dot com> ---
>  When the mask registers are available for use, RA considers them and when 
> spilling to those is cheaper than to memory, it spills to them and not memory.

Yes, this is the thing I don't get. When you compare the codegen for avx512 vs
non-avx512, the non-avx512 doesn't spill at all there. So this isn't "spill to
memory" vs "spill to mask register". This is "don't spill" vs "spill to mask
register". And the latter seems clearly worse.


-------------

(As an aside, Agner reports a certain "fast forwarding" on Tigerlake+, with
zero latency write-to-read for certain addresses, with stack computations being
easy ones. Looking at 'MOV r32,[m32]+MOV [m32],r32' here: 
http://users.atw.hu/instlatx64/GenuineIntel/GenuineIntel00506E3_Skylake2_InstLatX64.txt
http://users.atw.hu/instlatx64/GenuineIntel/GenuineIntel00606A6_ICX_InstLatX64.txt
http://users.atw.hu/instlatx64/GenuineIntel/GenuineIntel00806C1_TigerLake3_InstLatX64.txt
http://users.atw.hu/instlatx64/GenuineIntel/GenuineIntel0090672_AlderLake_BC_AVX512_InstLatX64.txt
you can see the huge reduction on Tigerlake and Alderlake. This is totally
irrelevant and immaterial for the purposes of this bug report, but I idly
wonder if at some point there'll be a slightly different cost model for this
between Icelake and Tigerlake+.)

Reply via email to