https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125880
--- Comment #9 from Hongtao Liu <liuhongt at gcc dot gnu.org> --- (In reply to [email protected] from comment #8) > On Mon, 22 Jun 2026, liuhongt at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125880 > > > > --- Comment #7 from Hongtao Liu <liuhongt at gcc dot gnu.org> --- > > (In reply to Hongtao Liu from comment #6) > > > > For the cases above the code comes from the vec_init expander but I can > > > > imagine this might be too early for a perfect decision. > > > > > > it comes from ix86_expand_vector_init_interleave which use SImode for > > > V*HI/V*QImode for vec_init_0. > > > > > > > By the time in ix86_exand_vector_init, we don't know if the source is from > > memory or gpr. > > - for memory, pinsrw/pinsrb probably is a win > > - For register, pinsrw/pinsrb from r32 should be worse than vmovd for port > > pressure on Intel-P core, but ok for E-core. For Zen: pinsr* is 2u vs 1u > > (latency-equal-ish); Zen5 gives pinsr great TP (0.25) but vmovd is still > > fewer > > uops. > > Yes, as said RTL expansion is likely to early. We'd want some kind of > peephole/splitter or an extension to STV? Ideally saving the GPR > use before RA. Maybe add a define_split for the specific patterns generated by vec_init 1295Trying 57, 59 -> 62: 1296 57: r204:HI=[r98:DI] 1297 59: r205:V4SI=vec_merge(vec_duplicate(r204:HI#0),const_vector,0x1) 1298 REG_DEAD r204:HI 1299 62: r206:V8HI=vec_merge(vec_duplicate([r300:DI*0x2+r98:DI]),r205:V4SI#0,0x2) 1300 REG_DEAD r205:V4SI 1301Failed to match this instruction: 1302(set (reg:V8HI 206) 1303 (vec_merge:V8HI (subreg:V8HI (vec_merge:V4SI (vec_duplicate:V4SI (subreg:SI (mem:HI (reg:DI 98 [ ivtmp.30 ]) [1 MEM[(short int *)_28]+0 S2 A16]) 0)) 1304 (const_vector:V4SI [ 1305 (const_int 0 [0]) repeated x4 1306 ]) 1307 (const_int 1 [0x1])) 0) 1308 (vec_duplicate:V8HI (mem:HI (plus:DI (mult:DI (reg:DI 300 [ _109 ]) 1309 (const_int 2 [0x2])) 1310 (reg:DI 98 [ ivtmp.30 ])) [1 MEM[(short int *)_28 + _48 * 2]+0 S2 A16])) 1311 (const_int 253 [0xfd])))
