| Issue |
169935
|
| Summary |
[x86] Vectorization of `std::replace` and its hand-written equivalent is not very useful and is often even somewhat harmful
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
AlexGuteniev
|
In the following example:
```C++
const char s[] = "....1.......1......1.......1....1......1...1.....1......1.....11....."
"......1.....1..........1.......1.........1..........1......1...........1.........1.."
".......1.........1....1...........1......1........1.....1.......1....1....1..1.....";
static void a(benchmark::State& state) {
char x[sizeof(s)];
for (auto _ : state) {
memcpy(x, s, sizeof(s));
benchmark::DoNotOptimize(x);
for (int i=0;i<sizeof(s);i++)
{
if (x[i] == '1')
x[i] = '2';
}
benchmark::DoNotOptimize(x);
}
}
```
Adding `#pragma clang loop vectorize(disable)` will not slow down much, and with this particular data it will actually speed up by 1.4x.
The reason is that stores by individual elements aren't efficient and produce the same branchy code as scalar code.
See https://quick-bench.com/q/nTz_9lPk7QlCi-M2RJizQR3UJKU. It is up to Clang 17, but the current trunk still has similar codeged.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs