Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-17 Thread Xavier Combelle
I never said it was impossible, just very hard.


Le 17/01/2017 à 16:48, Stephan Houben a écrit :
> Hi Xavier,
>
> In this bright age of IEEE-754 compatible CPUs,
> it is certainly possible to achieve reproducible FP.
> I worked for a company whose software produced bit-identical
> results on various CPUs (x86, Sparc, Itanium) and OSes (Linux,
> Solaris, Windows).
>
> The trick is to closely RTFM for your CPU and compiler, in particular
> all those nice
> appendices related to "FPU control words" and "FP consistency models".
>
> For example, if the author of that article had done so, he might have
> learned
> about the "precision control" field of the x87 status register, which
> you can set
> so that all intermediate operations are always represented as 64-bits
> doubles.
> So no double roundings from double-extended precision.
>
> (Incidentally, the x87-internal double-extended precision is another
> fine example where 
> being "more precise on occasion"  usually does not help.)
>
> Frankly not very impressed with that article. 
> I could go in detail but that's off-topic, and I will try to fight
> the "somebody is *wrong* on the Internet" urge.
>
> Stephan
>
> 2017-01-17 16:04 GMT+01:00 Xavier Combelle  >:
>
>
>> Generally speaking, there are two reasons why people may *not*
>> want an FMA operation.
>> 1. They need their results to be reproducible across
>> compilers/platforms. (the most common reason)
>>
> The reproducibility of floating point calculation is very hard to
> reach  a good survey of the problem is
> https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/
> 
> it mention the fma problem but it only a part of a biggest picture
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org 
> https://mail.python.org/mailman/listinfo/python-ideas
> 
> Code of Conduct: http://python.org/psf/codeofconduct/
> 
>
>

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-17 Thread Gregory P. Smith
Makes sense, thanks!  math.fma() it is. :)

On Tue, Jan 17, 2017, 7:48 AM Stephan Houben  wrote:

> Hi Xavier,
>
> In this bright age of IEEE-754 compatible CPUs,
> it is certainly possible to achieve reproducible FP.
> I worked for a company whose software produced bit-identical
> results on various CPUs (x86, Sparc, Itanium) and OSes (Linux, Solaris,
> Windows).
>
> The trick is to closely RTFM for your CPU and compiler, in particular all
> those nice
> appendices related to "FPU control words" and "FP consistency models".
>
> For example, if the author of that article had done so, he might have
> learned
> about the "precision control" field of the x87 status register, which you
> can set
> so that all intermediate operations are always represented as 64-bits
> doubles.
> So no double roundings from double-extended precision.
>
> (Incidentally, the x87-internal double-extended precision is another fine
> example where
> being "more precise on occasion"  usually does not help.)
>
> Frankly not very impressed with that article.
> I could go in detail but that's off-topic, and I will try to fight
> the "somebody is *wrong* on the Internet" urge.
>
> Stephan
>
> 2017-01-17 16:04 GMT+01:00 Xavier Combelle :
>
>
> Generally speaking, there are two reasons why people may *not* want an FMA
> operation.
> 1. They need their results to be reproducible across compilers/platforms.
> (the most common reason)
>
> The reproducibility of floating point calculation is very hard to reach  a
> good survey of the problem is
> https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/
> it mention the fma problem but it only a part of a biggest picture
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-17 Thread Stephan Houben
Hi Xavier,

In this bright age of IEEE-754 compatible CPUs,
it is certainly possible to achieve reproducible FP.
I worked for a company whose software produced bit-identical
results on various CPUs (x86, Sparc, Itanium) and OSes (Linux, Solaris,
Windows).

The trick is to closely RTFM for your CPU and compiler, in particular all
those nice
appendices related to "FPU control words" and "FP consistency models".

For example, if the author of that article had done so, he might have
learned
about the "precision control" field of the x87 status register, which you
can set
so that all intermediate operations are always represented as 64-bits
doubles.
So no double roundings from double-extended precision.

(Incidentally, the x87-internal double-extended precision is another fine
example where
being "more precise on occasion"  usually does not help.)

Frankly not very impressed with that article.
I could go in detail but that's off-topic, and I will try to fight
the "somebody is *wrong* on the Internet" urge.

Stephan

2017-01-17 16:04 GMT+01:00 Xavier Combelle :

>
> Generally speaking, there are two reasons why people may *not* want an FMA
> operation.
> 1. They need their results to be reproducible across compilers/platforms.
> (the most common reason)
>
> The reproducibility of floating point calculation is very hard to reach  a
> good survey of the problem is https://randomascii.wordpress.
> com/2013/07/16/floating-point-determinism/ it mention the fma problem but
> it only a part of a biggest picture
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-17 Thread Xavier Combelle

> Generally speaking, there are two reasons why people may *not* want an
> FMA operation.
> 1. They need their results to be reproducible across
> compilers/platforms. (the most common reason)
>
The reproducibility of floating point calculation is very hard to reach 
a good survey of the problem is
https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/
it mention the fma problem but it only a part of a biggest picture
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-17 Thread Stephan Houben
Hi Gregory,

2017-01-16 20:28 GMT+01:00 Gregory P. Smith :

> Is there a good reason not to detect single expression multiply adds and
> just emit a new FMA bytecode?
>
Yes ;-) See below.


> Is our goal for floats to strictly match the result of the same operations
> coded in unoptimized C using doubles?
>

I think it should be. This determinism is a feature, i.e. it is of value to
some,
although not to everybody.
The cost of this determinism if a possible loss of performance, but as I
already mentioned
in an earlier mail, I do not believe this cost would be observable in pure
Python code.
And anyway, people who care about numerical performance to that extent are
all using Numpy.


> Or can we be more precise on occasion?
>

Being more precise on occasion is only valuable if the occasion can be
predicted/controlled by the programmer.
(In this I assume you are not proposing that x*y+z would be guaranteed to
produce an FMA on *all* platforms,
even those lacking a hardware FMA. That would be very expensive.)

Generally speaking, there are two reasons why people may *not* want an FMA
operation.
1. They need their results to be reproducible across compilers/platforms.
(the most common reason)
2. The correctness of their algorithm depends on the intermediate rounding
step being done.

As an example of the second, take for example the cross product of two 2D
vectors:

def cross(a, b):
return a[0]*b[1] - b[0] * a[1]

In exact mathematics, this operation has the property that cross(a, b) ==
-cross(b,a).
In the current Python implementation, this property is preserved.
Synthesising an FMA would destroy it.

I guess a similar question may be asked of all C compilers as they too
> could emit an FMA instruction on such expressions... If they don't do it by
> default, that suggests we match them and not do it either.
>

C99 has defined #pragma's to let the programmer indicate if they care about
the strict FP model or not.
So in C99 I can express the following three options:

1. I need an FMA, give it to me even if it needs to be emulated expensively
in software:
 fma(x, y, z)

2. I do NOT want an FMA, please do intermediate rounding:
  #pragma STDC FP_CONTRACT OFF
   x*y + z

3. I don't care if you do intermediate rounding or not, just give me what
is fastest:
  #pragma STDC FP_CONTRACT ON
   x*y + z

Note that a conforming compiler can simply ignore FP_CONTRACT as long as it
never
generates an FMA for "x*y+z". This is what GCC does in -std mode.
It's what I would recommend for Python.

Regardless +1 on adding math.fma() either way as it is an expression of
> precise intent.
>

Yep.

Stephan


> -gps
>
> On Mon, Jan 16, 2017, 10:44 AM David Mertz  wrote:
>
>> My understanding is that NumPy does NOT currently support a direct FMA
>> operation "natively."  However, higher-level routines like
>> `numpy.linalg.solve` that are linked to MKL or BLAS DO take advantage of
>> FMA within the underlying libraries.
>>
>> On Mon, Jan 16, 2017 at 10:06 AM, Guido van Rossum 
>> wrote:
>>
>> Does numpy support this?
>>
>> --Guido (mobile)
>>
>> On Jan 16, 2017 7:27 AM, "Stephan Houben"  wrote:
>>
>> Hi Steve,
>>
>> Very good!
>> Here is a version which also handles the nan's, infinities,
>> negative zeros properly.
>>
>> ===
>> import math
>> from fractions import Fraction
>>
>> def fma2(x, y, z):
>> if math.isfinite(x) and math.isfinite(y) and math.isfinite(z):
>> result = float(Fraction(x)*Fraction(y) + Fraction(z))
>> if not result and not z:
>> result = math.copysign(result, x*y+z)
>> else:
>> result = x * y + z
>> assert not math.isfinite(result)
>> return result
>> ===
>>
>> Stephan
>>
>>
>> 2017-01-16 12:04 GMT+01:00 Steven D'Aprano :
>>
>> On Mon, Jan 16, 2017 at 11:01:23AM +0100, Stephan Houben wrote:
>>
>> [...]
>> > So the following would not be a valid FMA fallback
>> >
>> > double bad_fma(double x, double y, double z) {
>> >   return x*y + z;
>> > }
>> [...]
>> > Upshot: if we want to provide a software fallback in the Python code, we
>> > need to do something slow and complicated like musl does.
>>
>> I don't know about complicated. I think this is pretty simple:
>>
>> from fractions import Fraction
>>
>> def fma(x, y, z):
>> # Return x*y + z with only a single rounding.
>> return float(Fraction(x)*Fraction(y) + Fraction(z))
>>
>>
>> When speed is not the number one priority and accuracy is important,
>> its hard to beat the fractions module.
>>
>>
>> --
>> Steve
>> ___
>> Python-ideas mailing list
>> Python-ideas@python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>>
>>
>> ___
>> Python-ideas mailing list
>> Python-ideas@python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>>
>> 

Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-16 Thread Sven R. Kunze

On 16.01.2017 20:28, Gregory P. Smith wrote:


Is there a good reason not to detect single expression multiply adds 
and just emit a new FMA bytecode?




Same question here.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-16 Thread Gregory P. Smith
Is there a good reason not to detect single expression multiply adds and
just emit a new FMA bytecode?

Is our goal for floats to strictly match the result of the same operations
coded in unoptimized C using doubles?

Or can we be more precise on occasion?

I guess a similar question may be asked of all C compilers as they too
could emit an FMA instruction on such expressions... If they don't do it by
default, that suggests we match them and not do it either.

Regardless +1 on adding math.fma() either way as it is an expression of
precise intent.

-gps

On Mon, Jan 16, 2017, 10:44 AM David Mertz  wrote:

> My understanding is that NumPy does NOT currently support a direct FMA
> operation "natively."  However, higher-level routines like
> `numpy.linalg.solve` that are linked to MKL or BLAS DO take advantage of
> FMA within the underlying libraries.
>
> On Mon, Jan 16, 2017 at 10:06 AM, Guido van Rossum 
> wrote:
>
> Does numpy support this?
>
> --Guido (mobile)
>
> On Jan 16, 2017 7:27 AM, "Stephan Houben"  wrote:
>
> Hi Steve,
>
> Very good!
> Here is a version which also handles the nan's, infinities,
> negative zeros properly.
>
> ===
> import math
> from fractions import Fraction
>
> def fma2(x, y, z):
> if math.isfinite(x) and math.isfinite(y) and math.isfinite(z):
> result = float(Fraction(x)*Fraction(y) + Fraction(z))
> if not result and not z:
> result = math.copysign(result, x*y+z)
> else:
> result = x * y + z
> assert not math.isfinite(result)
> return result
> ===
>
> Stephan
>
>
> 2017-01-16 12:04 GMT+01:00 Steven D'Aprano :
>
> On Mon, Jan 16, 2017 at 11:01:23AM +0100, Stephan Houben wrote:
>
> [...]
> > So the following would not be a valid FMA fallback
> >
> > double bad_fma(double x, double y, double z) {
> >   return x*y + z;
> > }
> [...]
> > Upshot: if we want to provide a software fallback in the Python code, we
> > need to do something slow and complicated like musl does.
>
> I don't know about complicated. I think this is pretty simple:
>
> from fractions import Fraction
>
> def fma(x, y, z):
> # Return x*y + z with only a single rounding.
> return float(Fraction(x)*Fraction(y) + Fraction(z))
>
>
> When speed is not the number one priority and accuracy is important,
> its hard to beat the fractions module.
>
>
> --
> Steve
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
>
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-16 Thread David Mertz
My understanding is that NumPy does NOT currently support a direct FMA
operation "natively."  However, higher-level routines like
`numpy.linalg.solve` that are linked to MKL or BLAS DO take advantage of
FMA within the underlying libraries.

On Mon, Jan 16, 2017 at 10:06 AM, Guido van Rossum 
wrote:

> Does numpy support this?
>
> --Guido (mobile)
>
> On Jan 16, 2017 7:27 AM, "Stephan Houben"  wrote:
>
>> Hi Steve,
>>
>> Very good!
>> Here is a version which also handles the nan's, infinities,
>> negative zeros properly.
>>
>> ===
>> import math
>> from fractions import Fraction
>>
>> def fma2(x, y, z):
>> if math.isfinite(x) and math.isfinite(y) and math.isfinite(z):
>> result = float(Fraction(x)*Fraction(y) + Fraction(z))
>> if not result and not z:
>> result = math.copysign(result, x*y+z)
>> else:
>> result = x * y + z
>> assert not math.isfinite(result)
>> return result
>> ===
>>
>> Stephan
>>
>>
>> 2017-01-16 12:04 GMT+01:00 Steven D'Aprano :
>>
>>> On Mon, Jan 16, 2017 at 11:01:23AM +0100, Stephan Houben wrote:
>>>
>>> [...]
>>> > So the following would not be a valid FMA fallback
>>> >
>>> > double bad_fma(double x, double y, double z) {
>>> >   return x*y + z;
>>> > }
>>> [...]
>>> > Upshot: if we want to provide a software fallback in the Python code,
>>> we
>>> > need to do something slow and complicated like musl does.
>>>
>>> I don't know about complicated. I think this is pretty simple:
>>>
>>> from fractions import Fraction
>>>
>>> def fma(x, y, z):
>>> # Return x*y + z with only a single rounding.
>>> return float(Fraction(x)*Fraction(y) + Fraction(z))
>>>
>>>
>>> When speed is not the number one priority and accuracy is important,
>>> its hard to beat the fractions module.
>>>
>>>
>>> --
>>> Steve
>>> ___
>>> Python-ideas mailing list
>>> Python-ideas@python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>>
>>
>> ___
>> Python-ideas mailing list
>> Python-ideas@python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-16 Thread Guido van Rossum
Does numpy support this?

--Guido (mobile)

On Jan 16, 2017 7:27 AM, "Stephan Houben"  wrote:

> Hi Steve,
>
> Very good!
> Here is a version which also handles the nan's, infinities,
> negative zeros properly.
>
> ===
> import math
> from fractions import Fraction
>
> def fma2(x, y, z):
> if math.isfinite(x) and math.isfinite(y) and math.isfinite(z):
> result = float(Fraction(x)*Fraction(y) + Fraction(z))
> if not result and not z:
> result = math.copysign(result, x*y+z)
> else:
> result = x * y + z
> assert not math.isfinite(result)
> return result
> ===
>
> Stephan
>
>
> 2017-01-16 12:04 GMT+01:00 Steven D'Aprano :
>
>> On Mon, Jan 16, 2017 at 11:01:23AM +0100, Stephan Houben wrote:
>>
>> [...]
>> > So the following would not be a valid FMA fallback
>> >
>> > double bad_fma(double x, double y, double z) {
>> >   return x*y + z;
>> > }
>> [...]
>> > Upshot: if we want to provide a software fallback in the Python code, we
>> > need to do something slow and complicated like musl does.
>>
>> I don't know about complicated. I think this is pretty simple:
>>
>> from fractions import Fraction
>>
>> def fma(x, y, z):
>> # Return x*y + z with only a single rounding.
>> return float(Fraction(x)*Fraction(y) + Fraction(z))
>>
>>
>> When speed is not the number one priority and accuracy is important,
>> its hard to beat the fractions module.
>>
>>
>> --
>> Steve
>> ___
>> Python-ideas mailing list
>> Python-ideas@python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-16 Thread Stephan Houben
Hi Steve,

Very good!
Here is a version which also handles the nan's, infinities,
negative zeros properly.

===
import math
from fractions import Fraction

def fma2(x, y, z):
if math.isfinite(x) and math.isfinite(y) and math.isfinite(z):
result = float(Fraction(x)*Fraction(y) + Fraction(z))
if not result and not z:
result = math.copysign(result, x*y+z)
else:
result = x * y + z
assert not math.isfinite(result)
return result
===

Stephan


2017-01-16 12:04 GMT+01:00 Steven D'Aprano :

> On Mon, Jan 16, 2017 at 11:01:23AM +0100, Stephan Houben wrote:
>
> [...]
> > So the following would not be a valid FMA fallback
> >
> > double bad_fma(double x, double y, double z) {
> >   return x*y + z;
> > }
> [...]
> > Upshot: if we want to provide a software fallback in the Python code, we
> > need to do something slow and complicated like musl does.
>
> I don't know about complicated. I think this is pretty simple:
>
> from fractions import Fraction
>
> def fma(x, y, z):
> # Return x*y + z with only a single rounding.
> return float(Fraction(x)*Fraction(y) + Fraction(z))
>
>
> When speed is not the number one priority and accuracy is important,
> its hard to beat the fractions module.
>
>
> --
> Steve
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-16 Thread Steven D'Aprano
On Mon, Jan 16, 2017 at 11:01:23AM +0100, Stephan Houben wrote:

[...]
> So the following would not be a valid FMA fallback
> 
> double bad_fma(double x, double y, double z) {
>   return x*y + z;
> }
[...]
> Upshot: if we want to provide a software fallback in the Python code, we
> need to do something slow and complicated like musl does.

I don't know about complicated. I think this is pretty simple:

from fractions import Fraction

def fma(x, y, z):
# Return x*y + z with only a single rounding.
return float(Fraction(x)*Fraction(y) + Fraction(z))


When speed is not the number one priority and accuracy is important, 
its hard to beat the fractions module.


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-16 Thread Stephan Houben
Hi Victor,

The fallback implementations in the various libc take care
to preserve the correct rounding behaviour.

Let me stress that *fused* multiply-add means the specific rounding
behaviour as defined in the standard IEEE-754 2008
(i.e. with guaranteed *no* intermediate rounding).

So the following would not be a valid FMA fallback

double bad_fma(double x, double y, double z) {
  return x*y + z;
}

Now in practice, people want FMA for two reasons.
1. They need the additional precision.
2. They want the performance of a hardware FMA instruction.

Now, admittedly, the second category would be satisfied with the bad_fma
fallback.
However, I don't think 2. is a very compelling reason for fma *in pure
Python code*, since
the performance advantage would probably be dwarfed by interpreter overhead.

So I would estimate that approx. 100%  of the target audience of math.fma
would want to use
it for the increased accuracy. So providing a fallback which does not, in
fact, give that
accuracy would not make people happy.

Upshot: if we want to provide a software fallback in the Python code, we
need to do something
slow and complicated like musl does. Possibly by actually using the musl
code.

Either that, or we rely on the Python-external libc implementation always.

Stephan


2017-01-16 9:45 GMT+01:00 Victor Stinner :

> 2017-01-15 18:25 GMT+01:00 Juraj Sukop :
> > C99 includes `fma` function to `math.h` [6] and emulates the calculation
> if
> > the FMA instruction is not present on the host CPU [7].
>
> If even the libc function has a fallback on x*y followed by +z, it's
> fine to add such function to the Python stdlib. It means that Python
> can do the same if the libc lacks a fma() function. In the math
> module, the trend is more to implement missing functions or add
> special code to workaround bugs or limitations of libc functions.
>
> Victor
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-16 Thread Victor Stinner
2017-01-15 18:25 GMT+01:00 Juraj Sukop :
> C99 includes `fma` function to `math.h` [6] and emulates the calculation if
> the FMA instruction is not present on the host CPU [7].

If even the libc function has a fallback on x*y followed by +z, it's
fine to add such function to the Python stdlib. It means that Python
can do the same if the libc lacks a fma() function. In the math
module, the trend is more to implement missing functions or add
special code to workaround bugs or limitations of libc functions.

Victor
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-15 Thread Mark Dickinson
On Sun, Jan 15, 2017 at 5:25 PM, Juraj Sukop  wrote:
> This proposal is then about adding new `fma` function with the following
> signature to `math` module:
>
> math.fma(x, y, z)

Sounds good to me. Please could you open an issue on the bug tracker
(http://bugs.python.org)?

Thanks,

Mark
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-15 Thread Chris Angelico
On Mon, Jan 16, 2017 at 4:25 AM, Juraj Sukop  wrote:
> There is a simple module for Python 3 demonstrating the fused multiply-add
> operation which was build with simple `python3 setup.py build` under Linux
> [9].
>
> Any feedback is greatly appreciated!

+1. Just tried it out, and apart from dropping a pretty little
SystemError when I fat-finger the args wrong (a trivial matter of
adding more argument checking), it looks good.

Are there any possible consequences (not counting performance) of the
fall-back? I don't understand all the code in what you linked to, but
I think what's happening is that it goes to great lengths to avoid
intermediate rounding, so the end result is always going to be the
same. If that's the case, yeah, definite +1 on the proposal.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-15 Thread Stephan Houben
Hi Juraj,

I think this would be a very useful addition to the `math' module.

The gating issue is probably C compiler support.
The most important non-C99 C compiler for Python is probably MS Visual
Studio.
And that one appears to support it:

https://msdn.microsoft.com/en-us/library/mt720715.aspx

So +1 on the proposal.

Stephan

2017-01-15 18:25 GMT+01:00 Juraj Sukop :

> Hello!
>
> Fused multiply-add (henceforth FMA) is an operation which calculates the
> product of two numbers and then the sum of the product and a third number
> with just one floating-point rounding. More concretely:
>
> r = x*y + z
>
> The value of `r` is the same as if the RHS was calculated with infinite
> precision and the rounded to a 32-bit single-precision or 64-bit
> double-precision floating-point number [1].
>
> Even though one FMA CPU instruction might be calculated faster than the
> two separate instructions for multiply and add, its main advantage comes
> from the increased precision of numerical computations that involve the
> accumulation of products. Examples which benefit from using FMA are: dot
> product [2], compensated arithmetic [3], polynomial evaluation [4], matrix
> multiplication, Newton's method and many more [5].
>
> C99 includes `fma` function to `math.h` [6] and emulates the calculation
> if the FMA instruction is not present on the host CPU [7]. PEP 7 states
> that "Python versions greater than or equal to 3.6 use C89 with several
> select C99 features" and that "Future C99 features may be added to this
> list in the future depending on compiler support" [8].
>
> This proposal is then about adding new `fma` function with the following
> signature to `math` module:
>
> math.fma(x, y, z)
>
> '''Return a float representing the result of the operation `x*y + z` with
> single rounding error, as defined by the platform C library. The result is
> the same as if the operation was carried with infinite precision and
> rounded to a floating-point number.'''
>
> There is a simple module for Python 3 demonstrating the fused multiply-add
> operation which was build with simple `python3 setup.py build` under Linux
> [9].
>
> Any feedback is greatly appreciated!
>
> Juraj Sukop
>
> [1] https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation
> [2] S. Graillat, P. Langlois, N. Louvet. Accurate dot products with FMA.
> 2006
> [3] S. Graillat, Accurate Floating Point Product and Exponentiation. 2007.
> [4] S. Graillat, P. Langlois, N. Louvet. Improving the compensated Horner
> scheme with a Fused Multiply and Add. 2006
> [5] J.-M. Muller, N. Brisebarre, F. de Dinechin, C.-P. Jeannerod, V.
> Lefèvre, G. Melquiond, N. Revol, D. Stehlé, S. Torres. Handbook of
> Floating-Point Arithmetic. 2010. Chapter 5
> [6] ISO/IEC 9899:TC3, "7.12.13.1 The fma functions", Committee Draft -
> Septermber 7, 2007
> [7] https://git.musl-libc.org/cgit/musl/tree/src/math/fma.c
> [8] https://www.python.org/dev/peps/pep-0007/
> [9] https://github.com/sukop/fma
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Fused multiply-add (FMA)

2017-01-15 Thread Juraj Sukop
Hello!

Fused multiply-add (henceforth FMA) is an operation which calculates the
product of two numbers and then the sum of the product and a third number
with just one floating-point rounding. More concretely:

r = x*y + z

The value of `r` is the same as if the RHS was calculated with infinite
precision and the rounded to a 32-bit single-precision or 64-bit
double-precision floating-point number [1].

Even though one FMA CPU instruction might be calculated faster than the two
separate instructions for multiply and add, its main advantage comes from
the increased precision of numerical computations that involve the
accumulation of products. Examples which benefit from using FMA are: dot
product [2], compensated arithmetic [3], polynomial evaluation [4], matrix
multiplication, Newton's method and many more [5].

C99 includes `fma` function to `math.h` [6] and emulates the calculation if
the FMA instruction is not present on the host CPU [7]. PEP 7 states that
"Python versions greater than or equal to 3.6 use C89 with several select
C99 features" and that "Future C99 features may be added to this list in
the future depending on compiler support" [8].

This proposal is then about adding new `fma` function with the following
signature to `math` module:

math.fma(x, y, z)

'''Return a float representing the result of the operation `x*y + z` with
single rounding error, as defined by the platform C library. The result is
the same as if the operation was carried with infinite precision and
rounded to a floating-point number.'''

There is a simple module for Python 3 demonstrating the fused multiply-add
operation which was build with simple `python3 setup.py build` under Linux
[9].

Any feedback is greatly appreciated!

Juraj Sukop

[1] https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation
[2] S. Graillat, P. Langlois, N. Louvet. Accurate dot products with FMA.
2006
[3] S. Graillat, Accurate Floating Point Product and Exponentiation. 2007.
[4] S. Graillat, P. Langlois, N. Louvet. Improving the compensated Horner
scheme with a Fused Multiply and Add. 2006
[5] J.-M. Muller, N. Brisebarre, F. de Dinechin, C.-P. Jeannerod, V.
Lefèvre, G. Melquiond, N. Revol, D. Stehlé, S. Torres. Handbook of
Floating-Point Arithmetic. 2010. Chapter 5
[6] ISO/IEC 9899:TC3, "7.12.13.1 The fma functions", Committee Draft -
Septermber 7, 2007
[7] https://git.musl-libc.org/cgit/musl/tree/src/math/fma.c
[8] https://www.python.org/dev/peps/pep-0007/
[9] https://github.com/sukop/fma
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/