Hey,

On Wed, 27 Mar 2024, Jakub Jelinek wrote:

> > @@ -1712,12 +1711,9 @@ compute_idf (bitmap def_blocks, bitmap_head *dfs)
> >        gcc_checking_assert (bb_index
> >                        < (unsigned) last_basic_block_for_fn (cfun));
> >  
> > -      EXECUTE_IF_AND_COMPL_IN_BITMAP (&dfs[bb_index], phi_insertion_points,
> > -                                 0, i, bi)
> > -   {
> > +      EXECUTE_IF_SET_IN_BITMAP (&dfs[bb_index], 0, i, bi)
> > +   if (bitmap_set_bit (phi_insertion_points, i))
> >       bitmap_set_bit (work_set, i);
> > -     bitmap_set_bit (phi_insertion_points, i);
> > -   }
> >      }
> 
> I don't understand why the above is better.
> Wouldn't it be best to do
>       bitmap_ior_and_compl_into (work_set, &dfs[bb_index],
>                                phi_insertion_points);
>       bitmap_ior_into (phi_insertion_points, &dfs[bb_index]);
> ?

I had the same hunch, but:

1) One would have to make work_set be non-tree-view again (which with the 
current structure is a wash anyway, and that makes sense as accesses to 
work_set aren't heavily random here).

2) But doing that and using bitmap_ior.._into is still measurably slower: 
on a reduced testcase with -O0 -fno-checking, proposed structure 
(tree-view or not-tree-view workset doesn't matter):

 tree SSA rewrite                   :  14.93 ( 12%)   0.01 (  2%)  14.95 ( 
12%)    27M (  8%)

with non-tree-view, and your suggestion:

 tree SSA rewrite                   :  20.68 ( 12%)   0.02 (  4%)  20.75 ( 
12%)    27M (  8%)

I can only speculate that the usually extreme sparsity of the bitmaps in 
question make the setup costs of the two bitmap_ior calls actually more 
expensive than the often skipped second call to bitmap_set_bit in Richis 
proposed structure.  (That or cache effects)


Ciao,
Michael.

Reply via email to