https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113910
--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> --- Cutting the switch in decodeToMCInst after case 693: (roughly halving it by the number of source lines) gets us to tree PTA : 129.70 ( 92%) 0.51 ( 14%) 130.28 ( 90%) 2279k ( 0%) TOTAL : 140.28 3.68 144.01 982M a profile shows Samples: 657K of event 'cycles:u', Event count (approx.): 873340708228 Overhead Samples Command Shared Object Symbol 88.08% 578019 cc1plus cc1plus [.] bitmap_equal_p 4.76% 31340 cc1plus cc1plus [.] equiv_class_lookup_or_a 0.59% 4039 cc1plus cc1plus [.] bitmap_set_bit 0.24% 1611 cc1plus cc1plus [.] bitmap_copy the way we hash bitmaps is quite bad, we effectively hash set and a subset of unset bits. Adding a simple additional "hash", the number of set bits, improves this to Samples: 214K of event 'cycles:u', Event count (approx.): 283548833048 Overhead Samples Command Shared Object Symbol 69.73% 148209 cc1plus cc1plus [.] bitmap_equal_p 6.29% 13499 cc1plus cc1plus [.] equiv_class_lookup_or_add of course we still have too many calls (or too large but almost equal bitmaps here). Still I have a handle on this.