https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113326

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #5)
> One more thing:
> ```
>  vect_shift_0 = vect_value >> { 0, 1, 2, 3 };
>  vect_shift_1 = vect_value >> { 4, 5, 6, 7 };
>  vect_shift_2 = vect_value >> { 8, 9, 10, 11 };
>  vect_shift_3 = vect_value >> { 12, 13, 14, 15 };
> ```
> vs
> ```
>  vect_shift_0 = vect_value >> { 0, 1, 2, 3 };
>  vect_shift_1 = vect_shift_0 >> { 4, 4, 4, 4 };
>  vect_shift_2 = vect_shift_0 >> { 8, 8, 8, 8 };
>  vect_shift_3 = vect_shift_0 >> { 12, 12, 12, 12 };
> ```
> 
> the first has fully independent operations while in the second case, there
> is one dependent and the are independent operations.
> 
> On cores which has many vector units the first one might be faster than the
> second one.  So this needs a cost model too.

Note the vectorizer has the shift values dependent as well (across iterations),
we just constant propagate after unrolling here.

Note this is basically asking for "strength-reduction" of expensive
constants which could be more generally useful and not only for this
specific shift case.  Consider the same example but with an add instead
of a shift for example, the same exact set of constants will appear.

Reply via email to