https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734

--- Comment #3 from Robin Dapp <rdapp at gcc dot gnu.org> ---
> probably -fwhole-program is enough, -flto not needed(?)

Yes, -fwhole-program is sufficient.

> 
>   # vectp_g.248_1401 = PHI <vectp_g.248_1402(32), &g(143)>
> ...
>   _1411 = .SELECT_VL (ivtmp_1409, POLY_INT_CST [2, 2]);
> ..
>   vect__193.250_1403 = .MASK_LEN_LOAD (vectp_g.248_1401, 32B, { -1, ... },
> _1411, 0);
>   vect__194.251_1404 = -vect__193.250_1403;
>   vect_iftmp.252_1405 = (vector([2,2]) long int) vect__194.251_1404;
> 
>   # vect_iftmp.252_1406 = PHI <vect_iftmp.252_1405(5)>
>   # loop_len_1427 = PHI <_1411(5)>
> ...
>   _1407 = loop_len_1427 + 18446744073709551615;
>   _1408 = .VEC_EXTRACT (vect_iftmp.252_1406, _1407);
>   iftmp.3_1204 = _1408;
> 
> is stored to b[15].  Doesn't look too odd to me.

At the assembly equivalent of

>   vect__193.250_1403 = .MASK_LEN_LOAD (vectp_g.248_1401, 32B, { -1, ... },
> _1411, 0); 

we load [3 3] (=f) instead of [0 0] (=g).  f is located after g in memory and
register a3 is increased before the loop latch.  We then re-use a3 to load the
last two elements of g but actually read the first two of f.

Reply via email to