Hello,

Jan Hubicka via Gcc-patches <gcc-patches@gcc.gnu.org> writes:

> Hi,
> when vectorizing 4 times, we sometimes do
>   for
>     <4x vectorized body>
>   for
>     <2x vectorized body>
>   for
>     <1x vectorized body>
>
> Here the second two fors handling epilogue never iterates.
> Currently vecotrizer thinks that the middle for itrates twice.
> This turns out to be scale_profile_for_vect_loop that uses 
> niter_for_unrolled_loop.
>
> At that time we know epilogue will iterate at most 2 times
> but niter_for_unrolled_loop does not know that the last iteration
> will be taken by the epilogue-of-epilogue and thus it think
> that the loop may iterate once and exit in middle of second
> iteration.
>
> We already do correct job updating niter bounds and this is
> just ordering issue.  This patch makes us to first update
> the bounds and then do updating of the loop.  I re-implemented
> the function more correctly and precisely.
>
> The loop reducing iteration factor for overly flat profiles is bit funny, but
> only other method I can think of is to compute sreal scale that would have
> similar overhead I think.
>
> Bootstrapped/regtested x86_64-linux, comitted.
>
> gcc/ChangeLog:
>
>       PR middle-end/110649
>       * tree-vect-loop.cc (scale_profile_for_vect_loop):
>       (vect_transform_loop):
>       (optimize_mask_stores):

Our CI detected regressions on aarch64-linux-gnu with this commit in
gcc.target/aarch64/sve/aarch64-sve.exp. I checked today's trunk and it
still fails. I filed the following bug report with the details:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110727

Could you please check?

-- 
Thiago

Reply via email to