https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88153

Daniel Fruzynski <bugzi...@poradnik-webmastera.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |UNCONFIRMED
         Resolution|INVALID                     |---

--- Comment #4 from Daniel Fruzynski <bugzi...@poradnik-webmastera.com> ---
I checked man page for errno and it has following sencence:

"Valid error numbers are all nonzero; errno is never set to zero by any system
call or library function."

This means that code like mine from Comment 0 should do the trick: it checks
for negative values for all processed values, stores status in temporary
variable, and calls sqrt(-1) once at the end if one of these values was
negative.

I have created small benchmark:

[code]
#include <benchmark/benchmark.h>
#include <math.h>
#include <emmintrin.h>

#define SIZE 160

double src[SIZE];
double dest[SIZE];

static void BM_sqrt(benchmark::State& state)
{
    for (auto _ : state)
    {
        for (int n = 0; n < SIZE; ++n)
            dest[n] = sqrt(src[n]);
        benchmark::ClobberMemory();
    }
}
// Register the function as a benchmark
BENCHMARK(BM_sqrt);

static void BM_sse_sqrt_errno(benchmark::State& state)
{
    for (auto _ : state)
    {
        int m = 0;
        for (int n = 0; n < SIZE; n += 2)
        {
            __m128d v = _mm_load_pd(&src[n]);
            __m128d vs = _mm_sqrt_pd(v);
            __m128d vn = _mm_cmplt_pd(v, _mm_setzero_pd());
            m |= _mm_movemask_pd(vn);
            _mm_store_pd(&dest[n], vs);
        }
        if (m)
            sqrt(-1.0);
        benchmark::ClobberMemory();
    }
}
// Register the function as a benchmark
BENCHMARK(BM_sse_sqrt_errno);

static void BM_sse_sqrt(benchmark::State& state)
{
    for (auto _ : state)
    {
        for (int n = 0; n < SIZE; n += 2)
        {
            __m128d v = _mm_load_pd(&src[n]);
            __m128d vs = _mm_sqrt_pd(v);
            _mm_store_pd(&dest[n], vs);
        }
        benchmark::ClobberMemory();
    }
}
// Register the function as a benchmark
BENCHMARK(BM_sse_sqrt);

BENCHMARK_MAIN();
[/code]

This code was compiled using gcc 4.8.5, with following options:
g++ -std=c++11 -o test test.cc -O3 -I/benchmark/include/ -L/benchmark/lib/
-lbenchmark

Results for SIZE = 16 (loops unrolled):

---------------------------------------------------------
Benchmark                  Time           CPU Iterations
---------------------------------------------------------
BM_sqrt                   86 ns         86 ns    7188074
BM_sse_sqrt_errno         15 ns         15 ns   48084834
BM_sse_sqrt               15 ns         15 ns   47797778

Results for SIZE = 160 (loops not unrolled):

---------------------------------------------------------
Benchmark                  Time           CPU Iterations
---------------------------------------------------------
BM_sqrt                  995 ns        995 ns     839866
BM_sse_sqrt_errno        156 ns        156 ns    4348870
BM_sse_sqrt              144 ns        144 ns    4549107

As you can see, results for BM_sse_sqrt_errno are much better than BM_sqrt and
close to BM_sse_sqrt. If optimization implemented in BM_sse_sqrt_errno
satisfies error handling requirements for sqrt(), it is definitely worth
implementing in gcc.

Reply via email to