https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88153
Daniel Fruzynski <bugzi...@poradnik-webmastera.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |UNCONFIRMED Resolution|INVALID |--- --- Comment #4 from Daniel Fruzynski <bugzi...@poradnik-webmastera.com> --- I checked man page for errno and it has following sencence: "Valid error numbers are all nonzero; errno is never set to zero by any system call or library function." This means that code like mine from Comment 0 should do the trick: it checks for negative values for all processed values, stores status in temporary variable, and calls sqrt(-1) once at the end if one of these values was negative. I have created small benchmark: [code] #include <benchmark/benchmark.h> #include <math.h> #include <emmintrin.h> #define SIZE 160 double src[SIZE]; double dest[SIZE]; static void BM_sqrt(benchmark::State& state) { for (auto _ : state) { for (int n = 0; n < SIZE; ++n) dest[n] = sqrt(src[n]); benchmark::ClobberMemory(); } } // Register the function as a benchmark BENCHMARK(BM_sqrt); static void BM_sse_sqrt_errno(benchmark::State& state) { for (auto _ : state) { int m = 0; for (int n = 0; n < SIZE; n += 2) { __m128d v = _mm_load_pd(&src[n]); __m128d vs = _mm_sqrt_pd(v); __m128d vn = _mm_cmplt_pd(v, _mm_setzero_pd()); m |= _mm_movemask_pd(vn); _mm_store_pd(&dest[n], vs); } if (m) sqrt(-1.0); benchmark::ClobberMemory(); } } // Register the function as a benchmark BENCHMARK(BM_sse_sqrt_errno); static void BM_sse_sqrt(benchmark::State& state) { for (auto _ : state) { for (int n = 0; n < SIZE; n += 2) { __m128d v = _mm_load_pd(&src[n]); __m128d vs = _mm_sqrt_pd(v); _mm_store_pd(&dest[n], vs); } benchmark::ClobberMemory(); } } // Register the function as a benchmark BENCHMARK(BM_sse_sqrt); BENCHMARK_MAIN(); [/code] This code was compiled using gcc 4.8.5, with following options: g++ -std=c++11 -o test test.cc -O3 -I/benchmark/include/ -L/benchmark/lib/ -lbenchmark Results for SIZE = 16 (loops unrolled): --------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------- BM_sqrt 86 ns 86 ns 7188074 BM_sse_sqrt_errno 15 ns 15 ns 48084834 BM_sse_sqrt 15 ns 15 ns 47797778 Results for SIZE = 160 (loops not unrolled): --------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------- BM_sqrt 995 ns 995 ns 839866 BM_sse_sqrt_errno 156 ns 156 ns 4348870 BM_sse_sqrt 144 ns 144 ns 4549107 As you can see, results for BM_sse_sqrt_errno are much better than BM_sqrt and close to BM_sse_sqrt. If optimization implemented in BM_sse_sqrt_errno satisfies error handling requirements for sqrt(), it is definitely worth implementing in gcc.