https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076
--- Comment #10 from Jan Hubicka <hubicka at gcc dot gnu.org> --- I can re-confirm the 16% compile time regression. I went through some compare. $ wc -l *.ssa 299231 tramp3d-v4.ii.015t.ssa $ wc -l ../5/*.ssa 331115 ../5/tramp3d-v4.ii.018t.ssa so as a lame compare, we already have 10% more statements to start with. Now einline $ wc -l *.einline 692812 tramp3d-v4.ii.018t.einline $ wc -l ../5/*.einline 724090 ../5/tramp3d-v4.ii.026t.einline so after einline we seem to have 4% statements more, we do about the same number of inlining: $ grep Inlining tramp3d-v4.ii.*einline | wc -l 28003 $ grep Inlining ../5/tramp3d-v4.ii.*einline | wc -l 28685 but at release_ssa we still have about 4% more. $ wc -l *release_ssa* 348378 tramp3d-v4.ii.036t.release_ssa $ wc -l ../5/*release_ssa* 365689 ../5/tramp3d-v4.ii.043t.release_ssa There is no difference in number of functions in ssa and release_ssa dumps. What makes the functions bigger in GCC 5? $ grep "^ .* = " *.release_ssa | wc -l 65028 $ grep "^ .* = " ../5/*.release_ssa | wc -l 72636 The number of statements is about the same. During the actual inlining GCC 4.9 reports: Unit growth for small function inlining: 88536->114049 (28%) and Unit growth for small function inlining: 87943->97699 (11%) Statement count seems to remain 7% in .optimized dumps. So perhaps the slowdown is not really that much caused by IPA passes as we somehow manage to produce more code out of C++ FE. I looked for interesting differences in SSA dump. Here are few: -;; Function int __gthread_active_p() (_ZL18__gthread_active_pv, funcdef_no=312, decl_uid=8436, symbol_order=127) +;; Function int __gthread_active_p() (_ZL18__gthread_active_pv, funcdef_no=312, decl_uid=8537, cgraph_uid=127, symbol_order=127) int __gthread_active_p() () { - bool _1; - int _2; + static void * const __gthread_active_ptr = (void *) __gthrw_pthread_cancel; + void * __gthread_active_ptr.111_2; + bool _3; + int _4; <bb 2>: - _1 = __gthrw_pthread_cancel != 0B; - _2 = (int) _1; - return _2; + __gthread_active_ptr.111_2 = __gthread_active_ptr; + _3 = __gthread_active_ptr.111_2 != 0B; + _4 = (int) _3; + return _4; } ... this looks like header change, perhaps ... ObserverEvent::~ObserverEvent() (struct ObserverEvent * const this) { - int _6; + int (*__vtbl_ptr_type) () * _2; + int _7; <bb 2>: - this_3(D)->_vptr.ObserverEvent = &MEM[(void *)&_ZTV13ObserverEvent + 16B]; - *this_3(D) ={v} {CLOBBER}; - _6 = 0; - if (_6 != 0) + _2 = &_ZTV13ObserverEvent + 16; + this_4(D)->_vptr.ObserverEvent = _2; + MEM[(struct &)this_4(D)] ={v} {CLOBBER}; + _7 = 0; + if (_7 != 0) ... extra temporary initializing vtbl pointer. This is repeated many times ... -;; Function static Unique::Value_t Unique::get() (_ZN6Unique3getEv, funcdef_no=3030, decl_uid=51649, symbol_order=884) +;; Function static Unique::Value_t Unique::get() (_ZN6Unique3getEv, funcdef_no=3030, decl_uid=51730, cgraph_uid=883, symbol_order=884) static Unique::Value_t Unique::get() () { Value_t retval; - long int next_s.83_2; - long int next_s.84_3; - long int next_s.85_4; - Value_t _7; + long int next_s.83_3; + long int next_s.84_4; + long int next_s.85_5; + Value_t _9; <bb 2>: - Pooma::DummyMutex::_ZN5Pooma10DummyMutex4lockEv.isra.26 (); - next_s.83_2 = next_s; - next_s.84_3 = next_s.83_2; - next_s.85_4 = next_s.84_3 + 1; - next_s = next_s.85_4; - retval_6 = next_s.84_3; - Pooma::DummyMutex::_ZN5Pooma10DummyMutex6unlockEv.isra.27 (); - _7 = retval_6; - return _7; + Pooma::DummyMutex::lock (&mutex_s); + next_s.83_3 = next_s; + next_s.84_4 = next_s.83_3; + next_s.85_5 = next_s.84_4 + 1; + next_s = next_s.85_5; + retval_7 = next_s.84_4; + Pooma::DummyMutex::unlock (&mutex_s); + _9 = retval_7; + return _9; } ... here we give up on ISRA.... and we have about twice as much EH: $ grep "resx " tramp3d-v4.ii.*\.ssa | wc -l 4816 $ grep "resx " ../5/tramp3d-v4.ii.*\.ssa | wc -l 8671 which however is optimized out at a time of release_ssa. Another thing that we may consider to cleanup in next stage1 is to get rid of dead stores: - MEM[(struct new_allocator *)&D.561702] ={v} {CLOBBER}; - D.561702 ={v} {CLOBBER}; - D.561702 ={v} {CLOBBER}; - MEM[(struct new_allocator *)_2] ={v} {CLOBBER}; - MEM[(struct allocator *)_2] ={v} {CLOBBER}; - MEM[(struct _Alloc_hider *)_2] ={v} {CLOBBER}; - MEM[(struct basic_string *)_2] ={v} {CLOBBER}; - *_2 ={v} {CLOBBER}; - *this_1(D) ={v} {CLOBBER}; + MEM[(struct &)&D.570046] ={v} {CLOBBER}; + MEM[(struct &)&D.570046] ={v} {CLOBBER}; + D.570046 ={v} {CLOBBER}; + MEM[(struct &)_2] ={v} {CLOBBER}; + MEM[(struct &)_2] ={v} {CLOBBER}; + MEM[(struct &)_2] ={v} {CLOBBER}; + MEM[(struct &)_2] ={v} {CLOBBER}; + MEM[(struct &)_2] ={v} {CLOBBER}; + MEM[(struct &)this_1(D)] ={v} {CLOBBER}; Clobbers are dangerously common. There are 18K clobbers in release_ssa dump out of 65K assignments, that makes them to be 29% of all the code. The number of clobbers seems to go down only in tramp3d-v4.ii.166t.ehcleanup dump and we still get a lot of redundancies: <bb 32>: D.581063 ={v} {CLOBBER}; D.581063 ={v} {CLOBBER}; D.164155 ={v} {CLOBBER}; D.164155 ={v} {CLOBBER}; operator delete [] (begbuf_18); Why those are not considered a dead stores and DCEed out earlier?