https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78847
Bug ID: 78847 Summary: pointer arithmetic from c++ ranged-based for loop not optimized Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: krister.walfridsson at gmail dot com Target Milestone: --- GCC has some problems eliminating overhead from C++ range-based for loops. Consider the program #include <stddef.h> #include <cstring> #include <experimental/string_view> using string_view = std::experimental::string_view; class Foo { constexpr static size_t Length = 9; char ascii_[Length]; public: Foo(); string_view view() const { return string_view(ascii_, Length); } }; void testWithLoopValue(const Foo foo, size_t ptr, char *buf_) { for (auto c : foo.view()) buf_[ptr++] = c; } compiled as g++ -O3 -S -std=c++1z k.cpp ldist determines that this is a memcpy of length expressed as _14 _18 = (unsigned long) &MEM[(void *)&foo + 9B]; _4 = &foo.ascii_ + 1; _3 = (unsigned long) _4; _16 = _18 + 1; _14 = _16 - _3; and dom3 improves this to _18 = (unsigned long) &MEM[(void *)&foo + 9B]; _3 = (unsigned long) &MEM[(void *)&foo + 1B]; _16 = _18 + 1; _14 = _16 - _3; But this is not further simplified to 9 until combine, where it is too late, and a call to memcpy is generated instead of the expected inlined version.