https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410
--- Comment #9 from rguenther at suse dot de <rguenther at suse dot de> --- On Tue, 13 Jun 2023, crazylht at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410 > > --- Comment #8 from Hongtao.liu <crazylht at gmail dot com> --- > > > Can x86 do this? We'd want to apply this to a scalar, so move ivtmp > > to xmm, apply pack_usat or as you say below, the non-existing us_trunc > > and then broadcast. > > I see, we don't have scalar version. Also vector instruction looks not very > fast. > > https://uops.info/html-instr/VPMOVSDB_XMM_XMM.html Uh, yeah. Well, Zen4 looks reasonable though latency could be better. Preliminary performance data also shows masked epilogues are a mixed bag. I'll finish off the implementation and then we'll see if we can selectively enable it for the profitable cases somehow.