https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99105
--- Comment #8 from Martin Liška <marxin at gcc dot gnu.org> --- This is what I see for GCC PGO in train stage. It's from perf top: 4.33% cc1plus [.] __gcov_indirect_call_profiler_v4 ◆ 2.28% cc1plus [.] __gcov_topn_values_profiler ▒ 0.85% cc1plus [.] ggc_internal_alloc ▒ 0.83% [kernel] [k] perf_event_task_tick ▒ 0.72% libc-2.32.so [.] _int_malloc ▒ 0.71% cc1plus [.] ht_lookup_with_hash ▒ 0.53% cc1plus [.] grokdeclarator ▒ 0.48% cc1plus [.] df_note_compute ▒ 0.47% cc1plus [.] get_ref_base_and_extent ▒ 0.45% [kernel] [k] clear_page_rep ▒ 0.44% cc1plus [.] _cpp_lex_direct ▒ 0.41% cc1plus [.] walk_tree_1 ▒ 0.41% cc1plus [.] et_splay ▒ 0.41% cc1plus [.] bitmap_set_bit ▒ 0.40% libc-2.32.so [.] _int_free ▒ 0.39% cc1plus [.] bitmap_list_find_element ▒ 0.36% libc-2.32.so [.] malloc ▒ 0.35% cc1plus [.] operand_compare::operand_equal_p ▒ 0.35% cc1plus [.] hash_table<named_decl_hash, false, xcallocator>::find_slot_with_ha▒ In the case of GCC, we emit 500 .gcda files. @Honza: Can you please test my patch that uses glibc buffered I/O if it helps?