Hey, On Wed, 27 Mar 2024, Jakub Jelinek wrote:
> > @@ -1712,12 +1711,9 @@ compute_idf (bitmap def_blocks, bitmap_head *dfs) > > gcc_checking_assert (bb_index > > < (unsigned) last_basic_block_for_fn (cfun)); > > > > - EXECUTE_IF_AND_COMPL_IN_BITMAP (&dfs[bb_index], phi_insertion_points, > > - 0, i, bi) > > - { > > + EXECUTE_IF_SET_IN_BITMAP (&dfs[bb_index], 0, i, bi) > > + if (bitmap_set_bit (phi_insertion_points, i)) > > bitmap_set_bit (work_set, i); > > - bitmap_set_bit (phi_insertion_points, i); > > - } > > } > > I don't understand why the above is better. > Wouldn't it be best to do > bitmap_ior_and_compl_into (work_set, &dfs[bb_index], > phi_insertion_points); > bitmap_ior_into (phi_insertion_points, &dfs[bb_index]); > ? I had the same hunch, but: 1) One would have to make work_set be non-tree-view again (which with the current structure is a wash anyway, and that makes sense as accesses to work_set aren't heavily random here). 2) But doing that and using bitmap_ior.._into is still measurably slower: on a reduced testcase with -O0 -fno-checking, proposed structure (tree-view or not-tree-view workset doesn't matter): tree SSA rewrite : 14.93 ( 12%) 0.01 ( 2%) 14.95 ( 12%) 27M ( 8%) with non-tree-view, and your suggestion: tree SSA rewrite : 20.68 ( 12%) 0.02 ( 4%) 20.75 ( 12%) 27M ( 8%) I can only speculate that the usually extreme sparsity of the bitmaps in question make the setup costs of the two bitmap_ior calls actually more expensive than the often skipped second call to bitmap_set_bit in Richis proposed structure. (That or cache effects) Ciao, Michael.