https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84170
Bug ID: 84170 Summary: std::find_if performance issues Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: barannikov88 at gmail dot com Target Milestone: --- Hi, There are two internal implementations of std::find_if in bits/stl_algo.h. One is for the input iterator case, the other is for the random access iterator case. They are almost identical, except that the second is somewhat "optimized". > /// This is an overload used by find algos for the Input Iterator case. > template<typename _InputIterator, typename _Predicate> > inline _InputIterator > __find_if(_InputIterator __first, _InputIterator __last, > _Predicate __pred, input_iterator_tag) > { > while (__first != __last && !__pred(__first)) > ++__first; > return __first; > } > > /// This is an overload used by find algos for the RAI case. > template<typename _RandomAccessIterator, typename _Predicate> > _RandomAccessIterator > __find_if(_RandomAccessIterator __first, _RandomAccessIterator __last, > _Predicate __pred, random_access_iterator_tag) > { > typename iterator_traits<_RandomAccessIterator>::difference_type > __trip_count = (__last - __first) >> 2; > > for (; __trip_count > 0; --__trip_count) > { > if (__pred(__first)) > return __first; > ++__first; > > if (__pred(__first)) > return __first; > ++__first; > > if (__pred(__first)) > return __first; > ++__first; > > if (__pred(__first)) > return __first; > ++__first; > } > > switch (__last - __first) > { > case 3: > if (__pred(__first)) > return __first; > ++__first; > case 2: > if (__pred(__first)) > return __first; > ++__first; > case 1: > if (__pred(__first)) > return __first; > ++__first; > case 0: > default: > return __last; > } > } Manual unrolling increases the code size and reduces the chances of this function to be inlined. Inlining of this function is critical because it can contain indirect calls to the predicate function, and inlining can change this calls to direct improving performance. I would suggest to revert manual unrolling and let the compiler decide which optimization steps should be taken.