On 10/18/18 7:09 AM, Richard Biener wrote:
> 
> PR63155 made me pick up this old work from Steven, it turns our
> linked-list implementation to a two-mode one with one being a
> splay tree featuring O(log N) complexity for find/remove.
> 
> Over Stevens original patch I added a bitmap_tree_to_vec helper
> that I use from the debug/print methods to avoid changing view
> there.  In theory the bitmap iterator could get a "stack"
> as well and we could at least support EXECUTE_IF_SET_IN_BITMAP.
> 
> This can be used to fix the two biggest bottlenecks in the PRs
> testcase, namely SSA propagator worklist handling and out-of-SSA
> coalesce list building.  perf shows the following data, first
> unpatched, second patched - also watch the thrid coulumn (samples)
> when comparing percentages.
> 
> -O0
> -   18.19%    17.35%           407  cc1      cc1               [.] 
> bitmap_set_b▒
>    - bitmap_set_bit                                                           
>  ▒
>       + 8.77% create_coalesce_list_for_region                                 
>  ▒
>       + 4.21% calculate_live_ranges                                           
>  ▒
>       + 2.02% build_ssa_conflict_graph                                        
>  ▒
>       + 1.66% insert_phi_nodes_for                                            
>  ▒
>       + 0.86% coalesce_ssa_name                      
> patched:
> -   12.39%    10.48%           129  cc1      cc1               [.] 
> bitmap_set_b▒
>    - bitmap_set_bit                                                           
>  ▒
>       + 5.27% calculate_live_ranges                                           
>  ▒
>       + 2.76% insert_phi_nodes_for                                            
>  ▒
>       + 1.90% create_coalesce_list_for_region                                 
>  ▒
>       + 1.63% build_ssa_conflict_graph                                        
>  ▒
>       + 0.35% coalesce_ssa_name                           
> 
> -O1
> -   17.53%    17.53%           842  cc1      cc1               [.] 
> bitmap_set_b▒
>    - bitmap_set_bit                                                           
>  ▒
>       + 12.39% add_ssa_edge                                                   
>  ▒
>       + 1.48% create_coalesce_list_for_region                                 
>  ▒
>       + 0.82% solve_constraints                                               
>  ▒
>       + 0.71% calculate_live_ranges                                           
>  ▒
>       + 0.64% add_implicit_graph_edge                                         
>  ▒
>       + 0.41% insert_phi_nodes_for                                            
>  ▒
>       + 0.34% build_ssa_conflict_graph                      
> patched:
> -    5.79%     5.00%           167  cc1      cc1               [.] 
> bitmap_set_b▒
>    - bitmap_set_bit                                                           
>  ▒
>       + 1.41% add_ssa_edge                                                    
>  ▒
>       + 0.88% calculate_live_ranges                                           
>  ▒
>       + 0.75% add_implicit_graph_edge                                         
>  ▒
>       + 0.68% solve_constraints                                               
>  ▒
>       + 0.48% insert_phi_nodes_for                                            
>  ▒
>       + 0.45% build_ssa_conflict_graph                   
> 
> -O3
> -   12.37%    12.34%          1145  cc1      cc1               [.] 
> bitmap_set_b▒
>    - bitmap_set_bit                                                           
>  ▒
>       + 9.14% add_ssa_edge                                                    
>  ▒
>       + 0.80% create_coalesce_list_for_region                                 
>  ▒
>       + 0.69% add_implicit_graph_edge                                         
>  ▒
>       + 0.54% solve_constraints                                               
>  ▒
>       + 0.34% calculate_live_ranges                                           
>  ▒
>       + 0.27% insert_phi_nodes_for                                            
>  ▒
>       + 0.21% build_ssa_conflict_graph                 
> -    4.36%     3.86%           227  cc1      cc1               [.] 
> bitmap_set_b▒
>    - bitmap_set_bit                                                           
>  ▒
>       + 0.98% add_ssa_edge                                                    
>  ▒
>       + 0.86% add_implicit_graph_edge                                         
>  ▒
>       + 0.64% solve_constraints                                               
>  ▒
>       + 0.57% calculate_live_ranges                                           
>  ▒
>       + 0.32% build_ssa_conflict_graph                                        
>  ▒
>       + 0.29% mark_all_vars_used_1                                            
>  ▒
>       + 0.20% insert_phi_nodes_for                                            
>  ▒
>       + 0.16% create_coalesce_list_for_region             
> 
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> Any objections?
> 
> Thanks,
> Richard.
> 
> 2018-10-18  Steven Bosscher <ste...@gcc.gnu.org>
>       Richard Biener  <rguent...@suse.de>
> 
>       * bitmap.h: Update data structure documentation, including a
>       description of bitmap views as either linked-lists or splay trees.
>       (struct bitmap_element_def): Update comments for splay tree bitmaps.
>       (struct bitmap_head_def): Likewise.
>       (bitmap_list_view, bitmap_tree_view): New prototypes.
>       (debug_bitmap, debug_bitmap_file, bitmap_print): Update prototypes.
>       (dump_bitmap): Update to take non-const bitmap.
>       (bitmap_initialize_stat): Initialize a bitmap_head's indx and
>       tree_form fields.
>       (bmp_iter_set_init): Assert the iterated bitmaps are in list form.
>       (bmp_iter_and_init, bmp_iter_and_compl_init): Likewise.
>       * bitmap.c (next_bitmap_desc_id): Make unsigned.
>       (get_bitmap_descriptor): Make sure there are no more than 2^31
>       bitmap descriptors.
>       (bitmap_elem_to_freelist): Unregister overhead of a released bitmap
>       element here.
>       (bitmap_element_free): Remove.
>       (bitmap_elt_clear_from): Work on splay tree bitmaps.
>       (bitmap_list_link_element): Renamed from bitmap_element_link.  Move
>       this function similar ones such that linked-list bitmap implementation
>       functions are grouped.
>       (bitmap_list_unlink_element): Renamed from bitmap_element_unlink,
>       and moved for grouping.
>       (bitmap_list_insert_element_after): Renamed from
>       bitmap_elt_insert_after, and moved for grouping.
>       (bitmap_list_find_element): New function spliced from bitmap_find_bit.
>       (bitmap_tree_link_left, bitmap_tree_link_right,
>       bitmap_tree_rotate_left, bitmap_tree_rotate_right, bitmap_tree_splay,
>       bitmap_tree_link_element, bitmap_tree_unlink_element,
>       bitmap_tree_find_element): New functions for splay-tree bitmap
>       implementation.
>       (bitmap_element_link, bitmap_element_unlink, bitmap_elt_insert_after):
>       Renamed and moved, see above entries.
>       (bitmap_tree_listify_from): New function to convert part of a splay
>       tree bitmap to a linked-list bitmap.
>       (bitmap_list_view): Convert a splay tree bitmap to linked-list form.
>       (bitmap_tree_view): Convert a linked-list bitmap to splay tree form.
>       (bitmap_find_bit, bitmap_clear, bitmap_clear_bit, bitmap_set_bit,
>       bitmap_single_bit_set_p, bitmap_first_set_bit, bitmap_last_set_bit):
>       Handle splay tree bitmaps.
>       (bitmap_copy, bitmap_count_bits, bitmap_and, bitmap_and_into,
>       bitmap_elt_copy, bitmap_and_compl, bitmap_and_compl_into,
>       bitmap_compl_and_into, bitmap_elt_ior, bitmap_ior, bitmap_ior_into,
>       bitmap_xor, bitmap_xor_into, bitmap_equal_p, bitmap_intersect_p,
>       bitmap_intersect_compl_p, bitmap_ior_and_compl,
>       bitmap_ior_and_compl_into, bitmap_set_range, bitmap_clear_range,
>       bitmap_hash): Reject trying to act on splay tree bitmaps.  Make
>       corresponding changes to use linked-list specific bitmap_element
>       manipulation functions as applicable for efficiency.
>       (debug_bitmap_file): Handle splay tree bitmaps by converting the
>       bitmap to linked-list form and back.
>       (bitmap_print): Likewise.
>       (debug_bitmap): Take a non-const bitmap.
> 
>       PR tree-optimization/63155
>       * tree-ssa-propagate.c (ssa_prop_init): Use tree-view for the
>       SSA edge worklists.
>       * tree-ssa-coalesce.c (coalesce_ssa_name): Populate used_in_copies
>       in tree-view.
I'm a fan.  No objections from me.

It likely invalidates the analysis I did last year where jump threading
from VRP was important to optimize the bitmap stuff well.  But that
analysis probably needs to redone anyway and I'm unlikely to be able to
push on removal of VRP threading this cycle anyway.

Jeff


Reply via email to