https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95702

            Bug ID: 95702
           Summary: ranges::transform missing vectorization opportunity
           Product: gcc
           Version: 10.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pilarlatiesa at gmail dot com
  Target Milestone: ---

GCC 10 is able to vectorize this code (https://godbolt.org/z/tsbfQw):

#include <algorithm>
#include <vector>

void foo(std::vector<double> &u, std::vector<double> const &v)
{
  std::transform(std::begin(u), std::end(u),
                 std::begin(v),
                 std::begin(u),
                 std::plus());
}


However, it fails to vectorize the ranges equivalent
(https://godbolt.org/z/D49hJa):

#include <algorithm>
#include <vector>

void foo(std::vector<double> &u, std::vector<double> const &v)
{
  std::ranges::transform(u, v, std::begin(u), std::plus());
}

The option -fopt-info-vec-missed reveals why:

bits/ranges_algo.h:986:29: missed: not vectorized: number of iterations cannot
be computed.

Would an optimization like the following one make sense?

        template<input_iterator _Iter1, sentinel_for<_Iter1> _Sent1,
                 input_iterator _Iter2, sentinel_for<_Iter2> _Sent2,
                 weakly_incrementable _Out, copy_constructible _Fp,
                 typename _Proj1 = identity, typename _Proj2 = identity>
          requires indirectly_writable<_Out,
                                       indirect_result_t<_Fp&,
                                       projected<_Iter1, _Proj1>,
                                       projected<_Iter2, _Proj2>>>
          constexpr binary_transform_result<_Iter1, _Iter2, _Out>
          operator()(_Iter1 __first1, _Sent1 __last1,
                     _Iter2 __first2, _Sent2 __last2,
                     _Out __result, _Fp __binary_op,
                     _Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const
          {
+           if constexpr (random_access_iterator<_Iter1>
+                      && random_access_iterator<_Iter2>)
+             {
+               auto __d1 = ranges::distance(__first1, __last1);
+               auto __d2 = ranges::distance(__first2, __last2);
+               auto __n = std::min(__d1, __d2);
+               for (decltype(__n) __i = 0; __i < __n;
+                    (void)++__i, (void)++__first1, (void)++__first2,
++__result)
+             }
+           else
                  for (; __first1 != __last1 && __first2 != __last2;
                       ++__first1, (void)++__first2, ++__result)
                    *__result = std::__invoke(__binary_op,
                                              std::__invoke(__proj1,
*__first1),
                                              std::__invoke(__proj2,
*__first2));
                return {std::move(__first1), std::move(__first2),
std::move(__result)};
          }

Reply via email to