https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70729
--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> --- Index: gcc/passes.def =================================================================== --- gcc/passes.def (revision 235237) +++ gcc/passes.def (working copy) @@ -244,6 +244,7 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_cse_sincos); NEXT_PASS (pass_optimize_bswap); NEXT_PASS (pass_laddress); + NEXT_PASS (pass_lim); NEXT_PASS (pass_split_crit_edges); NEXT_PASS (pass_pre); NEXT_PASS (pass_sink_code); @@ -257,10 +258,8 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_fix_loops); NEXT_PASS (pass_tree_loop); PUSH_INSERT_PASSES_WITHIN (pass_tree_loop) - NEXT_PASS (pass_tree_loop_init); - NEXT_PASS (pass_lim); - NEXT_PASS (pass_copy_prop); NEXT_PASS (pass_dce); + NEXT_PASS (pass_tree_loop_init); NEXT_PASS (pass_tree_unswitch); NEXT_PASS (pass_scev_cprop); NEXT_PASS (pass_record_bounds); will then vectorize the loop successfully. I'm quite sympathetic to moving LIM before PRE (removing copy-prop might pessimize -O1 where PRE doesn't run, but -O1 should use a different pipeline, maybe based on the -Og one...). It also moves LIM out of -ftree-loop-optimize (the oacc kernels group also runs LIM independent on -ftree-loop-optimize).