https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287
--- Comment #9 from Jan Hubicka <hubicka at gcc dot gnu.org> --- This is _M_realloc insert at release_ssa time: eleased 63 names, 165.79%, removed 63 holes void std::vector<pair_t>::_M_realloc_insert<const pair_t&> (struct vector * const this, struct iterator __position, const struct pair_t & __args#0) { struct pair_t * const __position; struct pair_t * __new_finish; struct pair_t * __old_finish; struct pair_t * __old_start; long unsigned int _1; struct pair_t * _2; struct pair_t * _3; long int _4; long unsigned int _5; struct pair_t * _6; const size_type _10; long int _13; struct pair_t * iftmp.5_15; struct pair_t * _17; struct _Vector_impl * _18; long unsigned int _22; long int _23; long unsigned int _24; long unsigned int _25; struct pair_t * _26; long unsigned int _36; <bb 2> [local count: 1073741824]: __position_27 = MEM[(struct __normal_iterator *)&__position]; _10 = std::vector<pair_t>::_M_check_len (this_8(D), 1, "vector::_M_realloc_insert"); __old_start_11 = this_8(D)->D.25975._M_impl.D.25282._M_start; __old_finish_12 = this_8(D)->D.25975._M_impl.D.25282._M_finish; _13 = __position_27 - __old_start_11; if (_10 != 0) goto <bb 3>; [54.67%] else goto <bb 4>; [45.33%] <bb 3> [local count: 587014656]: _18 = &MEM[(struct _Vector_base *)this_8(D)]._M_impl; _17 = std::__new_allocator<pair_t>::allocate (_18, _10, 0B); <bb 4> [local count: 1073741824]: # iftmp.5_15 = PHI <0B(2), _17(3)> _1 = (long unsigned int) _13; _2 = iftmp.5_15 + _1; *_2 = *__args#0_14(D); if (_13 > 0) goto <bb 5>; [41.48%] else goto <bb 6>; [58.52%] <bb 5> [local count: 445388112]: __builtin_memmove (iftmp.5_15, __old_start_11, _1); <bb 6> [local count: 1073741824]: _36 = _1 + 8; __new_finish_16 = iftmp.5_15 + _36; _23 = __old_finish_12 - __position_27; if (_23 > 0) goto <bb 7>; [41.48%] else goto <bb 8>; [58.52%] <bb 7> [local count: 445388112]: _24 = (long unsigned int) _23; __builtin_memcpy (__new_finish_16, __position_27, _24); <bb 8> [local count: 1073741824]: _25 = (long unsigned int) _23; _26 = __new_finish_16 + _25; _3 = this_8(D)->D.25975._M_impl.D.25282._M_end_of_storage; _4 = _3 - __old_start_11; if (__old_start_11 != 0B) goto <bb 9>; [53.47%] else goto <bb 10>; [46.53%] <bb 9> [local count: 574129752]: _22 = (long unsigned int) _4; operator delete (__old_start_11, _22); <bb 10> [local count: 1073741824]: this_8(D)->D.25975._M_impl.D.25282._M_start = iftmp.5_15; this_8(D)->D.25975._M_impl.D.25282._M_finish = _26; _5 = _10 * 8; _6 = iftmp.5_15 + _5; this_8(D)->D.25975._M_impl.D.25282._M_end_of_storage = _6; return; } First it is not clear to me why we need memmove at all? So first issue is: <bb 2> [local count: 1073741824]: __position_27 = MEM[(struct __normal_iterator *)&__position]; _10 = std::vector<pair_t>::_M_check_len (this_8(D), 1, "vector::_M_realloc_insert"); __old_start_11 = this_8(D)->D.25975._M_impl.D.25282._M_start; __old_finish_12 = this_8(D)->D.25975._M_impl.D.25282._M_finish; _13 = __position_27 - __old_start_11; if (_10 != 0) goto <bb 3>; [54.67%] else goto <bb 4>; [45.33%] Without inlining _M_check_len early we can not work out return value range, since we need to know that paramter 2 is 1 and not 0. Adding __builtin_unreachable check after helps to reduce if (_10 != 0) but I need to do something about inliner accounting the conditional to function body size. <bb 4> [local count: 1073741824]: # iftmp.5_15 = PHI <0B(2), _17(3)> _1 = (long unsigned int) _13; _2 = iftmp.5_15 + _1; *_2 = *__args#0_14(D); if (_13 > 0) goto <bb 5>; [41.48%] else goto <bb 6>; [58.52%] <bb 5> [local count: 445388112]: __builtin_memmove (iftmp.5_15, __old_start_11, _1); Is this code about inserting value to the middle? Since push_back always initializes iterator to point to the end, this seems quite sily to do. Can't we do somehting like _M_realloc_append?