https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69609
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- So checking enabled trunk with -O2 -fno-checking results in PRE : 25.58 ( 7%) usr 0.53 (33%) sys 26.14 ( 7%) wall 793 kB ( 0%) ggc reorder blocks : 286.65 (80%) usr 0.08 ( 5%) sys 287.01 (79%) wall 432597 kB (58%) ggc TOTAL : 359.83 1.60 361.82 745954 kB callgrind points at bb_to_key (called 4.7 million times), accounting for 55% of all samples. Ah, bb_to_key does FOR_EACH_EDGE on PREDs which might explain this ... I suppose either caching the result of this loop or limiting it would fix the slowness. The first one looks very desirable. In fact pre-computing bb_to_key once looks very desirable, it looks the result won't change over the pass execution? Well, maybe .end_of_trace will. But then adjusting the pre-computed value of all succs when adjusting .end_of_trace might be possible.