http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57706
Bug ID: 57706 Summary: LRA is bottleneck while compiling LTO firefox Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org One of ltrans partitions wihle building firefox gets stuck with the following profile: CPU: AMD64 family10, speed 2100 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 750000 samples % image name app name symbol name 84432 27.1889 lto1 lto1 ggc_internal_alloc_stat(unsigned long) 5490 1.7679 libc-2.11.1.so libc-2.11.1.so _int_malloc 4746 1.5283 lto1 lto1 bitmap_set_bit(bitmap_head_def*, int) 4155 1.3380 libc-2.11.1.so libc-2.11.1.so memset 3190 1.0272 lto1 lto1 hash_table_mod1(unsigned int, unsigned int) 3029 0.9754 lto1 lto1 for_each_rtx_1(rtx_def*, int, int (*)(rtx_def**, void*), void*) 2860 0.9210 lto1 lto1 bitmap_bit_p(bitmap_head_def*, int) 2325 0.7487 lto1 lto1 df_note_compute(bitmap_head_def*) 2173 0.6998 as as hash_lookup 2102 0.6769 lto1 lto1 record_reg_classes(int, int, rtx_def**, machine_mode*, char const**, rtx_def*, reg_class*) 1859 0.5986 lto1 lto1 constrain_operands(int) 1804 0.5809 lto1 lto1 hash_table<variable_hasher, xcallocator>::find_slot_with_hash(void const*, unsigned int, insert_option) 1674 0.5391 libc-2.11.1.so libc-2.11.1.so malloc 1660 0.5346 lto1 lto1 operand_equal_p(tree_node const*, tree_node const*, unsigned int) 1653 0.5323 lto1 lto1 htab_find_slot_with_hash 1543 0.4969 libc-2.11.1.so libc-2.11.1.so _int_free 1538 0.4953 lto1 lto1 get_attr_enabled(rtx_def*) 1511 0.4866 lto1 lto1 mem_attrs_eq_p(mem_attrs const*, mem_attrs const*) 1376 0.4431 libc-2.11.1.so libc-2.11.1.so malloc_consolidate integrated RA : 57.28 (11%) usr 0.21 ( 3%) sys 57.51 (11%) wall 382450 kB (106%) ggc LRA non-specific : 5.35 ( 1%) usr 0.02 ( 0%) sys 5.43 ( 1%) wall 24447 kB ( 7%) ggc LRA virtuals elimination: 0.35 ( 0%) usr 0.01 ( 0%) sys 0.35 ( 0%) wall 8263 kB ( 2%) ggc LRA reload inheritance : 0.64 ( 0%) usr 0.01 ( 0%) sys 0.78 ( 0%) wall 11556 kB ( 3%) ggc LRA create live ranges : 1.11 ( 0%) usr 0.00 ( 0%) sys 0.89 ( 0%) wall 2973 kB ( 1%) ggc LRA hard reg assignment : 166.89 (33%) usr 0.03 ( 0%) sys 166.96 (33%) wall 0 kB ( 0%) ggc LRA coalesce pseudo regs: 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc reload : 0.13 ( 0%) usr 0.01 ( 0%) sys 0.18 ( 0%) wall 0 kB ( 0%) ggc reload CSE regs : 10.24 ( 2%) usr 0.04 ( 1%) sys 10.31 ( 2%) wall 51758 kB (14%) ggc load CSE after reload : 2.02 ( 0%) usr 0.01 ( 0%) sys 2.10 ( 0%) wall 185 kB ( 0%) ggc ree : 0.21 ( 0%) usr 0.02 ( 0%) sys 0.19 ( 0%) wall 696 kB ( 0%) ggc thread pro- & epilogue : 0.78 ( 0%) usr 0.00 ( 0%) sys 0.76 ( 0%) wall 21050 kB ( 6%) ggc if-conversion 2 : 0.10 ( 0%) usr 0.02 ( 0%) sys 0.16 ( 0%) wall 214 kB ( 0%) ggc combine stack adjustments: 0.13 ( 0%) usr 0.02 ( 0%) sys 0.14 ( 0%) wall 0 kB ( 0%) ggc peephole 2 : 0.77 ( 0%) usr 0.01 ( 0%) sys 0.70 ( 0%) wall 2982 kB ( 1%) ggc rename registers : 3.87 ( 1%) usr 0.00 ( 0%) sys 3.55 ( 1%) wall 16083 kB ( 4%) ggc hard reg cprop : 1.61 ( 0%) usr 0.01 ( 0%) sys 1.61 ( 0%) wall 821 kB ( 0%) ggc scheduling 2 : 11.50 ( 2%) usr 0.03 ( 0%) sys 11.47 ( 2%) wall 15888 kB ( 4%) ggc machine dep reorg : 1.81 ( 0%) usr 0.01 ( 0%) sys 1.71 ( 0%) wall 590 kB ( 0%) ggc reorder blocks : 1.26 ( 0%) usr 0.03 ( 0%) sys 1.12 ( 0%) wall 15841 kB ( 4%) ggc shorten branches : 0.96 ( 0%) usr 0.00 ( 0%) sys 1.13 ( 0%) wall 0 kB ( 0%) ggc reg stack : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 69 kB ( 0%) ggc final : 6.98 ( 1%) usr 0.46 ( 7%) sys 7.09 ( 1%) wall 129826 kB (36%) ggc variable output : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall 669 kB ( 0%) ggc symout : 15.92 ( 3%) usr 0.14 ( 2%) sys 26.70 ( 5%) wall 406238 kB (113%) ggc variable tracking : 14.50 ( 3%) usr 0.03 ( 0%) sys 14.71 ( 3%) wall 103487 kB (29%) ggc var-tracking dataflow : 11.07 ( 2%) usr 0.01 ( 0%) sys 10.80 ( 2%) wall 2108 kB ( 1%) ggc var-tracking emit : 9.11 ( 2%) usr 0.02 ( 0%) sys 9.26 ( 2%) wall 119939 kB (33%) ggc tree if-combine : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 66 kB ( 0%) ggc straight-line strength reduction: 0.34 ( 0%) usr 0.01 ( 0%) sys 0.27 ( 0%) wall 1583 kB ( 0%) ggc unaccounted optimizations: 0.00 ( 0%) usr 0.01 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc rest of compilation : 4.49 ( 1%) usr 1.21 (17%) sys 5.41 ( 1%) wall 56815 kB (16%) ggc remove unused locals : 0.33 ( 0%) usr 0.00 ( 0%) sys 0.37 ( 0%) wall 17 kB ( 0%) ggc address taken : 0.23 ( 0%) usr 0.00 ( 0%) sys 0.27 ( 0%) wall 3 kB ( 0%) ggc unaccounted todo : 2.71 ( 1%) usr 0.42 ( 6%) sys 3.19 ( 1%) wall 225 kB ( 0%) ggc rebuild frequencies : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall 18 kB ( 0%) ggc repair loop structures : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 0 kB ( 0%) ggc TOTAL : 499.43 7.04 512.56 360511 kB