[Bug c++/123292] New: Doesn't recognise open-coded memchr() (like in std::count(char , char , char)), so produces code 47% slower than if it used memchr()

nabijaczleweli at nabijaczleweli dot xyz via Gcc-bugs Wed, 24 Dec 2025 06:56:49 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123292


            Bug ID: 123292
           Summary: Doesn't recognise open-coded memchr() (like in
                    std::count(char *, char *, char)), so produces code
                    47% slower than if it used memchr()
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: nabijaczleweli at nabijaczleweli dot xyz
  Target Milestone: ---

With a 
  char buf[64 * 1024];
  ssize_t rd = read(fd, buf, sizeof(buf));
preamble
  acc += std::count(buf, buf + rd, '\n');
is ~45% slower than 
  auto newitr = buf;
  auto len    = rd;
  char * itr;
  // This is still suboptimal: a while(itr != end && *itr == '\n') ++acc; will
be better if the data has consecutive needles
  while(len && (newitr = static_cast<char *>(std::memchr(itr = newitr, '\n',
len)))) {
        ++acc.newlines;
        ++newitr;
        len -= newitr - itr;
  }
on GCC 15.2.0-12 and 12.2.0-3.

Full program and dataset at
<https://lfs.nabijaczleweli.xyz/0032-std::count-vs-memchr>, where I observe,
for a 184M file with 290957 lines of line-delimited JSON, std::count takes
~164ms and memchr takes ~67ms on E5645.

`perf record` shows 71% in the lambda for std::count and 30% in __memchr_sse2
for memchr.

I repro the ~45% regression on i5-1235U.

GCC should understand that for a `Byte * beg, end`, `while(beg != end && *beg
== needle) ++beg;` is an open-coding of memchr() and reify that to memchr.

[Bug c++/123292] New: Doesn't recognise open-coded memchr() (like in std::count(char *, char *, char)), so produces code 47% slower than if it used memchr()

Reply via email to

[Bug c++/123292] New: Doesn't recognise open-coded memchr() (like in std::count(char , char , char)), so produces code 47% slower than if it used memchr()