* From: Christian Ullrich

> On February 13, 2016 4:10:34 PM Tom Lane <t...@sss.pgh.pa.us> wrote:
> 
> > Christian Ullrich <ch...@chrullrich.net> writes:

> > Lastly, I'd like to see some discussion of what side effects
> > "_set_FMA3_enable(0);" has ... I rather doubt that it's really
> > a magic-elixir-against-crashes-with-no-downsides.
> 
> It tells the math library (in the CRT, no separate libm on Windows)
> not to use the AVX2-based implementations of log() and possibly
> other functions. AIUI, FMA means "fused multiply-add" and is
> apparently something that increases performance and accuracy in
> transcendental functions.
> 
> I can check the CRT source later today and figure out exactly what
> it does.

OK, it turns out that the CRT source MS ships is not quite as complete as I 
thought it was (up until 2013, at least), so I had a look at the disassembly. 
When the library initializes, it checks whether the CPU supports the FMA 
instructions by looking at a certain bit in the CPUID result. If that is set, 
it sets a flag to use the FMA instructions. Later, in exp(), log(), pow() and 
the trigonometrical functions, it first checks whether that flag is set, and if 
so, uses the AVX-based implementation. If the flag is not set, it falls back to 
an SSE2-based one. So, yes, that function only and specifically disables the 
use of instructions that do not work in the problematic case.

The bug appears to be that it uses all manner of AVX and AVX2 instructions 
based only on the FMA support flag in CPUID, even though AVX2 has its own bit 
there.

To reiterate: The problem occurs because the library only asks the CPU whether 
it is *able* to perform the AVX instructions, but not whether it is *willing* 
to do so. In this particular situation, the former applies but not the latter, 
because the CPU needs OS support (saving the XMM/YMM registers across context 
switches), and the OS has not declared its support for that.

The downside to disabling the AVX implementations is a performance loss 
compared to using it. I ran a microbenchmark (avg(log(x) from 
generate_series(1,1e8))), and the result was that with FMA enabled, it is ~5.5% 
faster than without.

-- 
Christian


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to