https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941
--- Comment #37 from H.J. Lu <hjl.tools at gmail dot com> --- (In reply to Richard Biener from comment #35) > (In reply to H.J. Lu from comment #33) > > Created attachment 61995 [details] > > An updated patch > > > > Please try this. > > Looking at the patch I do wonder about > > static void > ix86_place_single_vector_set (rtx dest, rtx src, bitmap bbs, > rtx inner_scalar = nullptr) > { > basic_block bb = nearest_common_dominator_for_set (CDI_DOMINATORS, bbs); > while (bb->loop_father->latch > != EXIT_BLOCK_PTR_FOR_FN (cfun)) > bb = get_immediate_dominator (CDI_DOMINATORS, > bb->loop_father->header); > > when the nearest common dominator is a BB in a loop nest like > > loop { > loop { > } > > loop { > BB; > } > BB'; > } > > this will skip an arbitrary number of earlier sibling loops. I think > if we want to do such additional hoisting at all - for a splat of a > non-constant we have to ensure the set of the source we splat is still > dominating the insertion point (where's that done?) - it IMO only > makes sense (without extra costing) to hoist the set out of a perfect > nest, thus never across earlier sibling loops. Even for BB' this is > likely problematic. Since my patch works, I'd like to keep it as is. Will it work for you?