https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #29 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to rguent...@suse.de from comment #26) > On Fri, 19 Jan 2024, juzhe.zhong at rivai dot ai wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 > > > > --- Comment #22 from JuzheZhong <juzhe.zhong at rivai dot ai> --- > > (In reply to Richard Biener from comment #21) > > > I once tried to avoid df_reorganize_refs and/or optimize this with the > > > blocks involved but failed. > > > > I am considering whether we should disable LICM for RISC-V by default if > > vector > > is enabled ? > > Since the compile time explode 10 times is really horrible. > > I think that's a bad idea. It only explodes for some degenerate cases. > The best would be to fix invariant motion to keep DF up-to-date so > it can stop using df_analyze_loop and instead analyze the whole function. > Or maybe change it to use the rtl-ssa framework instead. > > There's already param_loop_invariant_max_bbs_in_loop: > > /* Process the loops, innermost first. */ > for (auto loop : loops_list (cfun, LI_FROM_INNERMOST)) > { > curr_loop = loop; > /* move_single_loop_invariants for very large loops is time > consuming > and might need a lot of memory. For -O1 only do loop invariant > motion for very small loops. */ > unsigned max_bbs = param_loop_invariant_max_bbs_in_loop; > if (optimize < 2) > max_bbs /= 10; > if (loop->num_nodes <= max_bbs) > move_single_loop_invariants (loop); > } > > it might be possible to restrict invariant motion to innermost loops > when the overall number of loops is too large (with a new param > for that). And when the number of innermost loops also exceeds > the limit avoid even that? The above also misses a > optimize_loop_for_speed_p (loop) check (probably doesn't make > a difference, but you could try). Ah, sorry - I was mis-matching LICM to invariant motion above, still invariant motion is the biggest offender (might be due to DF checking if you enabled that). As for sbitmap vs. bitmap it's a difficult call. When there's big profile hits on individual bit operations (bitmap_bit_p, bitmap_set_bit) it might may off to use bitmap but with tree view. There's also sparseset but that requires even more memory.