memset not vectorized when equivalent loop is

rguenther at suse dot de Tue, 22 Sep 2015 07:06:01 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65965


--- Comment #5 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 22 Sep 2015, alalaw01 at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65965
> 
> --- Comment #4 from alalaw01 at gcc dot gnu.org ---
> (In reply to Richard Biener from comment #3)
> > Fixed for GCC 6.
> 
> Indeed. I note that the same testcase does _not_ SLP/vectorize if I use
> consecutive indices:
> 
> void
> test (int*__restrict a, int*__restrict b)
> {
>     a[0] = b[0];
>     a[1] = b[1];
>     a[2] = b[2];
>     a[3] = b[3];
>     a[4] = 0;
>     a[5] = 0;
>     a[6] = 0;
>     a[7] = 0;
> }
> 
> loop26a.c:6:13: note: Build SLP failed: different operation in stmt MEM[(int
> *)a
> _4(D) + 28B] = 0;
> loop26a.c:6:13: note: original stmt *a_4(D) = _3;
> loop26a.c:6:13: note: === vect_slp_analyze_data_ref_dependences ===
> loop26a.c:6:13: note: === vect_slp_analyze_operations ===
> loop26a.c:6:13: note: not vectorized: bad operation in basic block.
> 
> Worth another bug?

The above looks like if SLP is trying a vector size of v8si.  It
_should_ work for v4si.  For v8si we indeed can't vectorize this
as we don't support "partial" loads.  We could vectorize with
masked loads and IIRC on x86_64 the masked elements can be 
initialized to 0 or -1, so we can OR in the constant pieces.

Not sure if that's worth another bug, please double-check your
vector size first.

[Bug middle-end/65965] Straight-line memcpy/memset not vectorized when equivalent loop is

Reply via email to