https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119482
--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
r15-9175 on x86_64 now shows (with release checking, built with GCC 7, not
bootstrapped):
Samples: 112K of event 'cycles:Pu', Event count (approx.): 139897585915
Overhead Samples Command Shared Object Symbol
6.42% 7028 cc1plus cc1plus [.]
bitmap_and_into(bitma
6.19% 6851 cc1plus cc1plus [.]
bitmap_list_insert_el
3.96% 4408 cc1plus cc1plus [.]
bitmap_ior_into(bitma
3.28% 3695 cc1plus cc1plus [.]
bitmap_set_bit(bitmap
2.75% 3099 cc1plus cc1plus [.]
get_ref_base_and_exte
2.31% 2532 cc1plus cc1plus [.]
bitmap_and(bitmap_hea
1.85% 2052 cc1plus cc1plus [.]
bitmap_ior_and_compl(
1.83% 2027 cc1plus cc1plus [.]
bitmap_copy(bitmap_he
1.32% 1482 cc1plus cc1plus [.]
bitmap_bit_p(bitmap_h
1.24% 1439 cc1plus cc1plus [.]
bitmap_set_range(bitm
1.18% 1329 cc1plus cc1plus [.]
lra_create_live_range
I'll note the -ftime-report is quite flat, DF is sticking out (every RTL pass
hits on this) as well as LRA (for the same reason, plus
lra_create_live_ranges).
The bitmap_list_insert_element_after hit is actually mispredictions on
bitmap_element_allocate wrt the en-block queued free elements from
bitmap_elt_clear_from:
element = bit_obstack->elements;
if (element)
/* Use up the inner list first before looking at the next
element of the outer list. */
if (element->next)
{
bit_obstack->elements = element->next;
possibly having two free element lists, one for singletons and one for
lists would be friendlier, thus
if (bit_obstack->elements)
...
else if (bit_obstack->multi_elements)
...