https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78288

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
As update with todays trunk 96b7f495f9269d5448822e4fc28882 I see for
not bootstrapped build of asan_interceptors.cc on x86_64 with -fno-checking:

 callgraph ipa passes               :   1.24 ( 11%)   0.11 ( 14%)   1.36 ( 11%)
 117240 kB ( 19%)
 integrated RA                      :   0.47 (  4%)   0.06 (  8%)   0.35 (  3%)
  48666 kB (  8%)
 variable tracking                  :   0.15 (  1%)   0.00 (  0%)   0.21 (  2%)
  14895 kB (  2%)
 var-tracking dataflow              :   1.64 ( 14%)   0.00 (  0%)   1.57 ( 13%)
    158 kB (  0%)
 var-tracking emit                  :   0.30 (  3%)   0.01 (  1%)   0.32 (  3%)
   9859 kB (  2%)
 TOTAL                              :  11.51          0.76         12.27       
 625036 kB

so this might not be the very best example for var-tracking slowness.  I
do have a simple patch that improves it to

 variable tracking                  :   0.21 (  2%)   0.00 (  0%)   0.22 (  2%)
  14895 kB (  2%)
 var-tracking dataflow              :   0.95 (  8%)   0.00 (  0%)   0.94 (  8%)
    157 kB (  0%)
 var-tracking emit                  :   0.31 (  3%)   0.00 (  0%)   0.37 (  3%)
   9859 kB (  2%)

though which is done by changing the vt_find_locations iteration scheme
from worklist/pending to immediately following backedges for cache locality
reasons and to avoid processing parts of the function we'd eventually have
to revisit anyway due to changed DF on the backedge destination.

diff --git a/gcc/var-tracking.c b/gcc/var-tracking.c
index 899a5c0290d..7929b0b39eb 100644
--- a/gcc/var-tracking.c
+++ b/gcc/var-tracking.c
@@ -7209,22 +7209,18 @@ vt_find_locations (void)
                      if (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
                        continue;

-                     if (bitmap_bit_p (visited, e->dest->index))
-                       {
-                         if (!bitmap_bit_p (in_pending, e->dest->index))
-                           {
-                             /* Send E->DEST to next round.  */
-                             bitmap_set_bit (in_pending, e->dest->index);
-                             pending->insert (bb_order[e->dest->index],
-                                              e->dest);
-                           }
-                       }
-                     else if (!bitmap_bit_p (in_worklist, e->dest->index))
+                     if (!bitmap_bit_p (in_worklist, e->dest->index))
                        {
                          /* Add E->DEST to current round.  */
                          bitmap_set_bit (in_worklist, e->dest->index);
                          worklist->insert (bb_order[e->dest->index],
                                            e->dest);
+                         /* Clear visited in case we already visited
+                            the block.  We're following backedges
+                            immediately to improve cache locality
+                            and avoid wasting cycles on blocks we'd
+                            have to re-visit eventually.  */
+                         bitmap_clear_bit (visited, e->dest->index);
                        }
                    }
                }


all the worklist/fibheap/visited tracking could be also simplified down
to a single worklist bitmap (not sbitmap) indexed by bb_order and
extracting elements via bitmap_first_set_bit.  But that's of course
unrelated and not part of the performance issue (the visited bitmap
can be elimiated though).

It looks like this two-worklist approach is there from the very beginning,
also the use of a fibheap.

Reply via email to