Re: [1003.1(2016)/Issue7+TC2 0001178]: atan2: Description of IEC 60559 Floating-Point option is unclear

2017-12-17 Thread Bruce Evans

On Sun, 17 Dec 2017, Austin Group Bug Tracker wrote:


Summary:atan2: Description of IEC 60559 Floating-Point
option is unclear
Description:
What is meaning of the following statement:
If the IEC 60559 Floating-Point option is supported, y/ x should be
returned.


This is clear, and wrong.  It says that (if the result underflows) then the
implementation should not try to return a correctly rounded result, but
should return the approximation y/x correctly rounded.  It is unclear if
these results are the same even in the default rounding mode, but it is
clear that y/x usually gives the wrong result in all other rounding modes.
The standard doesn't over-specify this in most places, and leaves it to
quality of implementation.

Example: if y = DBL_MIN and x = 2, then y/x underflows to DBL_MIN/2,
but the infinite-precision result for atan2(x, y) is slightly larger,
so in round-towards-plus-infinity mode the result should never be y/x,
but must larger.  How much larger?  About (y/x)**2/3 times larger.
This is still far below DBL_MIN, so we don't have to worry about the
rounding giving a non-underflowing result.  Also the correction is
much smaller than 1 double precision ULP, so the correctly rounded
result is only 1 ULP higher than y/x.  In the default round-to-nearest
mode, the correctly rounded result is y/x since y/x is exact and the
correction is tiny.



By the way, in the previous paragraph:
If the correct value would cause underflow, a range error may occur, and
atan(), atan2f(), and atan2l() shall return an implementation-defined value
no greater in magnitude than DBL_MIN, FLT_MIN, and LDBL_MIN, respectively.

it seems atan() should be atan2().


I can't find this in old versions of the POSIX standard, and "correct
value would cause overflow" is another over-specifation.

"correct value" seems to be unspecified, and for atan2() it would be
unclear when it would cause underflow even if it were specified.  Underflow
and overflow thresholds are only clear for functions like exp() when there
are only 2 of them (for each supported rounding mode), and old versions of
the standard seem to do the right thing by only specifying "would cause
overflow/underflow" for such functions.

For 2-arg functions like atan2(x, y), there is a different underflow
threshold for each x.  It is impossible to calculate all of them in
advance, and difficult to calculate them at runtime except for simple
cases like x = 2.  Calculating then requires doing perfect-enough rounding
near the threshold.  Underflow for y/x is easy to determine (by doing the
operation and depending on IEC 60559's perfect rounding for this operation),
but y/x isn't the "correct value" for the function.

These bugs seem to be missing in C99 through C11 (Annexes for IEC 60559).
C standards are rather too silent about overflows and underflows involving
finite args, since nothing can be specified without overspecifying the
accuracy including the rounding.

Bruce



Re: [1003.1(2016)/Issue7+TC2 0001178]: atan2: Description of IEC 60559 Floating-Point option is unclear

2017-12-17 Thread Bruce Evans

On Mon, 18 Dec 2017, I wrote:


...
Example: if y = DBL_MIN and x = 2, then y/x underflows to DBL_MIN/2,
but the infinite-precision result for atan2(x, y) is slightly larger,
so in round-towards-plus-infinity mode the result should never be y/x,
but must larger.  How much larger?  About (y/x)**2/3 times larger.


Er, "must be larger", and "(y/x)**2/3 times y/x larger".  The factor
is the relative error.  It is a tiny fraction of 1 ULP.


This is still far below DBL_MIN, so we don't have to worry about the
rounding giving a non-underflowing result.  Also the correction is
much smaller than 1 double precision ULP, so the correctly rounded
result is only 1 ULP higher than y/x.  In the default round-to-nearest
mode, the correctly rounded result is y/x since y/x is exact and the
correction is tiny.


Bruce



Re: [1003.1(2016)/Issue7+TC2 0001178]: atan2: Description of IEC 60559 Floating-Point option is unclear

2019-03-01 Thread Vincent Lefevre
On 2019-02-15 10:19:03 +, Austin Group Bug Tracker wrote:
> New proposed resolution:
> 
> On page 611 line 21228 section atan2(), change:[MX]If the
> correct value would cause underflow, a range error may occur, and atan(),
> atan2f(), and atan2l() shall return an implementation-defined value no
> greater in magnitude than DBL_MIN, FLT_MIN, and LDBL_MIN,
> respectively.[/MX]
> 
> [MXX]If the IEC 60559 Floating-Point option is supported, y/x should be
> returned.[/MXX]to:If the correct value would cause
> underflow, a range error may occur, and atan2(), atan2f(), and atan2l()
> shall return an implementation-defined value no greater in magnitude than
> DBL_MIN, FLT_MIN, and LDBL_MIN, respectively. [MXX]If the IEC 60559
> Floating-Point option is supported, y/x should be
> returned.[/MXX]

I don't understand the recommendation "y/x should be returned", while
correct rounding would be more accurate. While in case of underflow,
atan2(y,x) is mathematically very close to y/x, there may be a
difference in correct rounding when the exact value of y/x is the
midpoint of two machine numbers (which is possible in the subnormal
range). At least it should offer the choice between returning y/x
correctly rounded and returning atan2(y,x) correctly rounded.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)