Issue 169935
Summary [x86] Vectorization of `std::replace` and its hand-written equivalent is not very useful and is often even somewhat harmful
Labels new issue
Assignees
Reporter AlexGuteniev
    In the following example:
```C++
const char s[] = "....1.......1......1.......1....1......1...1.....1......1.....11....."
 "......1.....1..........1.......1.........1..........1......1...........1.........1.."
 ".......1.........1....1...........1......1........1.....1.......1....1....1..1.....";

static void a(benchmark::State& state) {
  char x[sizeof(s)];

  for (auto _ : state) {
    memcpy(x, s, sizeof(s));
    benchmark::DoNotOptimize(x);
 for (int i=0;i<sizeof(s);i++)
    {
        if (x[i] == '1')
 x[i] = '2';
    }
    benchmark::DoNotOptimize(x);
  }
}
``` 
Adding `#pragma clang loop vectorize(disable)` will not slow down much, and with this particular data it will actually speed up by 1.4x.
The reason is that stores by individual elements aren't efficient and produce the same branchy code as scalar code.

See https://quick-bench.com/q/nTz_9lPk7QlCi-M2RJizQR3UJKU. It is up to Clang 17, but the current trunk still has similar codeged.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to