Hello,
Jan Hubicka via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > Hi, > when vectorizing 4 times, we sometimes do > for > <4x vectorized body> > for > <2x vectorized body> > for > <1x vectorized body> > > Here the second two fors handling epilogue never iterates. > Currently vecotrizer thinks that the middle for itrates twice. > This turns out to be scale_profile_for_vect_loop that uses > niter_for_unrolled_loop. > > At that time we know epilogue will iterate at most 2 times > but niter_for_unrolled_loop does not know that the last iteration > will be taken by the epilogue-of-epilogue and thus it think > that the loop may iterate once and exit in middle of second > iteration. > > We already do correct job updating niter bounds and this is > just ordering issue. This patch makes us to first update > the bounds and then do updating of the loop. I re-implemented > the function more correctly and precisely. > > The loop reducing iteration factor for overly flat profiles is bit funny, but > only other method I can think of is to compute sreal scale that would have > similar overhead I think. > > Bootstrapped/regtested x86_64-linux, comitted. > > gcc/ChangeLog: > > PR middle-end/110649 > * tree-vect-loop.cc (scale_profile_for_vect_loop): > (vect_transform_loop): > (optimize_mask_stores): Our CI detected regressions on aarch64-linux-gnu with this commit in gcc.target/aarch64/sve/aarch64-sve.exp. I checked today's trunk and it still fails. I filed the following bug report with the details: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110727 Could you please check? -- Thiago