I strongly agree with you Gregor:
* Best precision should remain the default. I lost months in finding
the compiler option (in ICC) which switched to LA mode and broke all
my calculations.
* I wonder how those SVML behaves on non-intel plateform ? Sleef
provides the same approach but it works also on Power and ARM
platforms (and is designed to be extended...).
Cheers,
Jerome
On Wed, 28 Jul 2021 12:13:44 +0200
Gregor Thalhammer wrote:
> > Am 28.07.2021 um 01:50 schrieb Sebastian Berg :
> >
> > Hi all,
> >
> > there is a proposal to add some Intel specific fast math routine to
> > NumPy:
> >
> >https://github.com/numpy/numpy/pull/19478
>
> Many years ago I wrote a package
> https://github.com/geggo/uvml
> that makes the VML, a fast implementation of transcendetal math functions,
> available for numpy. Don’t know if it still compiles.
> It uses Intel VML, designed for processing arrays, not the SVML intrinsics.
> By this it is less machine dependent (optimized implementations are selected
> automatically depending on the availability of, e.g., SSE, AVX, or AVX512),
> just link to a library. It compiles as an external module, can be activated
> at runtime.
>
> Different precision models can be selected at runtime (globally). I thinks
> Intel advocates to use the LA (low accuracy) mode as a good compromise
> between performance and accuracy. Different people have strongly diverging
> opinions about what to expect.
>
> The speedups possibly gained by these approaches often vaporize in
> non-benchmark applications, as for those functions performance is often
> limited by memory bandwidth, unless all your data stays in CPU cache. By
> default I would go for high accuracy mode, with option to switch to low
> accuracy if one urgently needs the better performance. But then one should
> use different approaches for speeding up numpy.
>
> Gregor
>
>
> >
> > part of numerical algorithms is that there is always a speed vs.
> > precision trade-off, giving a more precise result is slower.
> >
> > So there is a question what the general precision expectation should be
> > in NumPy. And how much is it acceptable to diverge in the
> > precision/speed trade-off depending on CPU/system?
> >
> > I doubt we can formulate very clear rules here, but any input on what
> > precision you would expect or trade-offs seem acceptable would be
> > appreciated!
> >
> >
> > Some more details
> > -
> >
> > This is mainly interesting e.g. for functions like logarithms,
> > trigonometric functions, or cubic roots.
> >
> > Some basic functions (multiplication, addition) are correct as per IEEE
> > standard and give the best possible result, but these are typically
> > only correct within very small numerical errors.
> >
> > This is typically measured as "ULP":
> >
> > https://en.wikipedia.org/wiki/Unit_in_the_last_place
> >
> > where 0.5 ULP would be the best possible result.
> >
> >
> > Merging the PR may mean relaxing the current precision slightly in some
> > places. In general Intel advertises 4 ULP of precision (although the
> > actual precision for most functions seems better).
> >
> >
> > Here are two tables, one from glibc and one for the Intel functions:
> >
> > https://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.html
> > (Mainly the LA column)
> > https://software.intel.com/content/www/us/en/develop/documentation/onemkl-vmperfdata/top/real-functions/measured-accuracy-of-all-real-vm-functions.html
> >
> >
> > Different implementation give different accuracy, but formulating some
> > guidelines/expectation (or referencing them) would be useful guidance.
> >
> > For basic
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
--
Jérôme Kieffer
tel +33 476 882 445
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion