https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287
Bug ID: 110287 Summary: _M_check_len is expensive Product: gcc Version: 13.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- I am looking into ineffective codegen for loops controlled by std::vec based stack (see testcase in PR109849). The problem is that we fail to inline enough of implementation of std::push_back to fully SRA the stack which makes it very slow. The reason is that _M_realloc_insert currently does not fit in inlining limits for -O2. It seems to be inlined by clang. The first issue is rather large _M_check_len that currently does: size_type std::vector<std::pair<unsigned int, unsigned int> >::_M_check_len (const struct vector * const this, size_type __n, const char * __s) { const size_type __len; const long unsigned int D.27747; long unsigned int _1; long unsigned int __n.3_2; long unsigned int _3; size_type iftmp.4_4; long unsigned int _5; long unsigned int _8; long int _10; long int _13; struct pair * _14; struct pair * _15; const long unsigned int & _18; <bb 2> [local count: 1073741824]: _15 = this_7(D)->D.26707._M_impl.D.26014._M_finish; _14 = this_7(D)->D.26707._M_impl.D.26014._M_start; _13 = _15 - _14; _10 = _13 /[ex] 8; _8 = (long unsigned int) _10; _1 = 1152921504606846975 - _8; __n.3_2 = __n; if (_1 < __n.3_2) goto <bb 3>; [0.04%] else goto <bb 4>; [99.96%] <bb 3> [local count: 429496]: std::__throw_length_error (__s_16(D)); <bb 4> [local count: 1073312329]: D.27747 = _8; if (__n.3_2 > _8) goto <bb 5>; [34.00%] else goto <bb 6>; [66.00%] <bb 5> [local count: 364926196]: <bb 6> [local count: 1073312330]: # _18 = PHI <&D.27747(4), &__n(5)> _3 = *_18; __len_11 = _3 + _8; D.27747 ={v} {CLOBBER(eol)}; if (_8 > __len_11) goto <bb 8>; [35.00%] else goto <bb 7>; [65.00%] <bb 7> [local count: 697653013]: _5 = MIN_EXPR <__len_11, 1152921504606846975>; <bb 8> [local count: 1073312330]: # iftmp.4_4 = PHI <1152921504606846975(6), _5(7)> return iftmp.4_4; } Whis is used by _M_realloc_insert: _20 = std::vector<std::pair<unsigned int, unsigned int> >::_M_check_len (this_18(D), 1, "vector::_M_realloc_insert"); So _n is 1. The test: _1 = 1152921504606846975 - _8; __n.3_2 = __n; if (_1 < __n.3_2) goto <bb 3>; [0.04%] else goto <bb 4>; [99.96%] <bb 3> [local count: 429496]: std::__throw_length_error (__s_16(D)); Can IMO be never true, since we would need to have already allocated vector of 1152921504606846975 elements which will not fit in memory anyway. This brings in the EH handling and error message. Perhaps for constantly sized _M_check_len and constantly sizes vector elements we can use __builtin_constant_p to avoid calling _M_check_len from _M_realloc_insert and invent _M_safe_check_len that avoids this test. (for IPA optimizations to work out that there will be no EH and call is cheaper we need to have the test at caller side instead in _M_check_len).