https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120475
Bug ID: 120475
Summary: vector<bool> is 60x slower with ASan
detect_stack_use_after_return=1
Product: gcc
Version: 13.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: dani at danielbertalan dot dev
Target Milestone: ---
When compiled with -O2 -fsanitize=address -fsanitize=undefined, the code below
runs 60 times slower if ASAN_OPTIONS=detect_stack_use_after_return=1 is set
(default since GCC 13) than with stack use after return checking disabled.
Benchmark 1: env ASAN_OPTIONS=detect_stack_use_after_return=1 ./bug
Time (mean ± σ): 10.443 s ± 0.846 s [User: 10.432 s, System: 0.009 s]
Range (min … max): 10.056 s … 12.677 s 10 runs
Benchmark 2: env ASAN_OPTIONS=detect_stack_use_after_return=0 ./bug
Time (mean ± σ): 167.9 ms ± 3.7 ms [User: 161.7 ms, System: 5.9 ms]
Range (min … max): 161.5 ms … 174.1 ms 18 runs
Summary
env ASAN_OPTIONS=detect_stack_use_after_return=0 ./bug ran
62.20 ± 5.22 times faster than env
ASAN_OPTIONS=detect_stack_use_after_return=1 ./bug
GCC version: g++ (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Target: aarch64-linux-gnu
The benchmark numbers are from an AWS instance with the Graviton 2 processor
---
// Godbolt: https://godbolt.org/z/5jGc7rc9v
#include <vector>
#include <iostream>
[[gnu::noinline, gnu::noipa]] void test(std::vector<int> &a) {
const int M = 2e6 + 10;
std::vector<bool> isprime(M+1, true);
isprime[0] = isprime[1] = false;
for (int i = 2; i <= M; i++) {
if (isprime[i] && (long long) i*i <= M) {
for (int j = i*i; j <= M; j+=i) isprime[j] = false;
}
}
int ans = 1;
std::vector<int> even, res;
for (int i = 0; i < a.size(); i++) {
if (a[i]%2) even.push_back(a[i]);
}
res = {a[0]};
for (int e : a) {
if (ans < 2 && !(e%2)) {
for (int& j : even) {
if (isprime[e+j]) {
ans = 2;
res.clear(); res.push_back(e); res.push_back(j);
break;
}
}
}
}
std::cout << ans << "\n";
for (const int& e : res) std::cout << e << " ";
}
int main() {
std::vector<int> v = {2,3};
test(v);
}
perf shows that 98% of the time is spent in __asan_stack_malloc_2, which is
called from vector<bool>::operator[]. It looks like that function gets big
enough for it not to be considered for inlining under -O2.