https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91735

--- Comment #6 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 11 Sep 2019, ubizjak at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91735
> 
> --- Comment #5 from Uroš Bizjak <ubizjak at gmail dot com> ---
> (In reply to Richard Biener from comment #3)
> > Reducing the VF here should be the goal.  For the particular case "filling"
> > the holes with neutral data and blending in the original values at store 
> > time
> > will likely be optimal.  So do
> > 
> >   tem = vector load
> >   zero all [4] elements
> >   compute
> >   blend in 'tem' into the [4] elements
> >   vector store
> 
> MASKMOVDQU [1] should be an excellent fit here.

Yes, but it's probably slower.  And it avoids store data races,
of course plus avoids epilogue peeling (eventually).

Reply via email to