https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410

--- Comment #9 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 13 Jun 2023, crazylht at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410
> 
> --- Comment #8 from Hongtao.liu <crazylht at gmail dot com> ---
> 
> > Can x86 do this?  We'd want to apply this to a scalar, so move ivtmp
> > to xmm, apply pack_usat or as you say below, the non-existing us_trunc
> > and then broadcast.
> 
> I see, we don't have scalar version. Also vector instruction looks not very
> fast.
> 
> https://uops.info/html-instr/VPMOVSDB_XMM_XMM.html

Uh, yeah.  Well, Zen4 looks reasonable though latency could be better.

Preliminary performance data also shows masked epilogues are a
mixed bag.  I'll finish off the implementation and then we'll see
if we can selectively enable it for the profitable cases somehow.

Reply via email to