https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123292
Bug ID: 123292
Summary: Doesn't recognise open-coded memchr() (like in
std::count(char *, char *, char)), so produces code
47% slower than if it used memchr()
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: nabijaczleweli at nabijaczleweli dot xyz
Target Milestone: ---
With a
char buf[64 * 1024];
ssize_t rd = read(fd, buf, sizeof(buf));
preamble
acc += std::count(buf, buf + rd, '\n');
is ~45% slower than
auto newitr = buf;
auto len = rd;
char * itr;
// This is still suboptimal: a while(itr != end && *itr == '\n') ++acc; will
be better if the data has consecutive needles
while(len && (newitr = static_cast<char *>(std::memchr(itr = newitr, '\n',
len)))) {
++acc.newlines;
++newitr;
len -= newitr - itr;
}
on GCC 15.2.0-12 and 12.2.0-3.
Full program and dataset at
<https://lfs.nabijaczleweli.xyz/0032-std::count-vs-memchr>, where I observe,
for a 184M file with 290957 lines of line-delimited JSON, std::count takes
~164ms and memchr takes ~67ms on E5645.
`perf record` shows 71% in the lambda for std::count and 30% in __memchr_sse2
for memchr.
I repro the ~45% regression on i5-1235U.
GCC should understand that for a `Byte * beg, end`, `while(beg != end && *beg
== needle) ++beg;` is an open-coding of memchr() and reify that to memchr.