https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103252
--- Comment #12 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Jason A. Donenfeld from comment #9) > > When the mask registers are available for use, RA considers them and when > > spilling to those is cheaper than to memory, it spills to them and not > > memory. > > Yes, this is the thing I don't get. When you compare the codegen for avx512 > vs non-avx512, the non-avx512 doesn't spill at all there. So this isn't > "spill to memory" vs "spill to mask register". This is "don't spill" vs > "spill to mask register". And the latter seems clearly worse. for non-avx512, Due to the small number of registers available, and the short live range of r132, r132 is first Pushing a18(r132,l0) (cost 70) ---- (allocate as mem first) and then finally found there're available register Popping a18(r132,l0) -- assign reg 2. --------- (allocate as register when there're available register) for avx512, due to enough number of registers available, r132 is finally assigned as alternative class. for while picture avx512 has less mem allocated. avx2: Disposition: 26:r82 l1 3 1:r82 l0 3 36:r89 l1 2 2:r89 l0 2 13:r97 l0 5 37:r101 l1 1 3:r101 l0 4 27:r103 l1 mem 4:r103 l0 mem 38:r105 l1 0 5:r105 l0 0 28:r108 l1 6 6:r108 l0 6 0:r112 l0 0 29:r113 l1 5 7:r113 l0 5 30:r114 l1 mem 8:r114 l0 mem 31:r115 l1 mem 9:r115 l0 mem 22:r118 l0 0 21:r119 l0 0 40:r128 l1 0 39:r129 l1 0 17:r130 l0 1 16:r131 l0 2 18:r132 l0 2 15:r136 l0 1 12:r139 l0 0 32:r142 l1 mem 10:r142 l0 mem 33:r143 l1 4 20:r143 l0 mem 34:r144 l1 mem 11:r144 l0 mem 35:r145 l1 mem 19:r145 l0 mem 25:r146 l0 0 24:r147 l0 1 23:r148 l0 2 41:r149 l1 0 14:r150 l0 0 avx512: Disposition: 26:r82 l1 3 1:r82 l0 3 36:r89 l1 1 2:r89 l0 2 13:r97 l0 4 37:r101 l1 2 3:r101 l0 1 27:r103 l1 mem 4:r103 l0 mem 38:r105 l1 0 5:r105 l0 0 28:r108 l1 6 6:r108 l0 6 0:r112 l0 0 29:r113 l1 4 7:r113 l0 4 30:r114 l1 mem 8:r114 l0 mem 31:r115 l1 mem 9:r115 l0 mem 22:r118 l0 0 21:r119 l0 0 40:r128 l1 0 39:r129 l1 0 17:r130 l0 2 16:r131 l0 68 18:r132 l0 68 15:r136 l0 2 12:r139 l0 0 32:r142 l1 mem 10:r142 l0 mem 33:r143 l1 mem 20:r143 l0 mem 34:r144 l1 5 11:r144 l0 5 35:r145 l1 mem 19:r145 l0 mem 25:r146 l0 0 24:r147 l0 1 23:r148 l0 2 41:r149 l1 0 14:r150 l0 0 So for short live range reg, we may lose opportunity to allocate best regclass, maybe add peephole2 to handle those cases instead of tune RA.