[Bug c/61399] LDBL_MAX is incorrect with IBM long double format / overflow issues near large values

2021-08-18 Thread vincent-gcc at vinc17 dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61399

--- Comment #13 from Vincent Lefèvre  ---
(In reply to Jakub Jelinek from comment #12)
> That isn't representable in the GCC internal representation, which pretends
> the type has fixed 106 bit precision
[...]

So, if I understand correctly, this is due to a limitation of GCC internals.

> The only way around that would be to actually represent it in GCC internal
> representation as sum of two doubles and rewrite all operations on this mode
> to treat it specially.  That is a lot of work and given that powerpc64le is
> switching from this floating point format to IEEE quad long double format,
> I'm certain nobody is willing to spend that much time (months) on it.

Well, if there is a difficulty of implementation, perhaps leave LDBL_MAX as is,
and consider for instance that an operation that with the result
0x1.f7cp+1023 would overflow (which is AFAIK undefined
behavior for non-IEEE-754 formats, so that this would be conforming). But this
should be documented.

[Bug c/61399] LDBL_MAX is incorrect with IBM long double format / overflow issues near large values

2021-08-18 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61399

--- Comment #12 from Jakub Jelinek  ---
(In reply to Vincent Lefèvre from comment #11)
> In addition to the maximum exponent issue, for LDBL_MAX following the defect
> report, instead of
> 
>   0x1.f78p+1023
> 
> I would expect
> 
>   0x1.f7cp+1023 = DBL_MAX + DBL_MAX * DBL_EPSILON / 4
> 
> as it is larger (it has one more trailing 1) and representable.

That isn't representable in the GCC internal representation, which pretends the
type has fixed 106 bit precision (like double has 53 bit precision),
0x1.f7cp+1023 needs 107 bit precision (and generally
the type has variable precision).
The only way around that would be to actually represent it in GCC internal
representation as sum of two doubles and rewrite all operations on this mode to
treat it specially.  That is a lot of work and given that powerpc64le is
switching from this floating point format to IEEE quad long double format, I'm
certain nobody is willing to spend that much time (months) on it.

[Bug c/61399] LDBL_MAX is incorrect with IBM long double format / overflow issues near large values

2021-08-18 Thread vincent-gcc at vinc17 dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61399

--- Comment #11 from Vincent Lefèvre  ---
In addition to the maximum exponent issue, for LDBL_MAX following the defect
report, instead of

  0x1.f78p+1023

I would expect

  0x1.f7cp+1023 = DBL_MAX + DBL_MAX * DBL_EPSILON / 4

as it is larger (it has one more trailing 1) and representable.

[Bug c/61399] LDBL_MAX is incorrect with IBM long double format / overflow issues near large values

2017-06-20 Thread vincent-gcc at vinc17 dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61399

--- Comment #10 from Vincent Lefèvre  ---
(In reply to jos...@codesourcery.com from comment #9)
> That wording defines what "normalized" is, so that values with maximum 
> exponent in this case don't count as normalized because not all values 
> with that exponent in the floating-point model are representable, and 
> defines LDBL_MAX so that it doesn't need to be normalized (and in this 
> case isn't, by that definition of normalized).  The definition of 
> LDBL_MAX_EXP is unchanged; it still just requires 2^(LDBL_MAX_EXP-1) to be 
> representable, without requiring it to be normalized.

This would be pretty useless as a definition. This would mean that emin is in
the "normalized" range (due to the LDBL_MIN_EXP definition), but one doesn't
have any guarantee for the larger exponents.

Thus a type that contains only 0, normalized values of exponent emin,
1+LDBL_EPSILON, and 2^(LDBL_MAX_EXP-1), could be a valid type (with, say, long
double = double = float). Just because you assume that emax has no relations
with normalized values.

Note that the standard doesn't specify a range for the property related to
LDBL_DIG, and this property is obviously incorrect for *any* floating-point
number with q decimal digits. In particular, the property is not satisfied in
general when the target is in the range of the subnormal numbers. So, I don't
expect it to be necessarily satisfied outside the range of the normalized
numbers.

[Bug c/61399] LDBL_MAX is incorrect with IBM long double format / overflow issues near large values

2017-06-20 Thread joseph at codesourcery dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61399

--- Comment #9 from joseph at codesourcery dot com  ---
On Tue, 20 Jun 2017, vincent-gcc at vinc17 dot net wrote:

> > The current proposed wording for DR#467 
> >  changes 
> > "maximum representable finite floating-point number, [ math formula ]" to 
> > "maximum representable finite floating-point number; if that value is 
> > normalized, its value is [ math formula ],", which I think addresses the 
> > first issue without requiring any change to LDBL_MAX_EXP;
> 
> This change concerns the LDBL_MAX definition. However, there is another issue,
> related to the floating-point model 5.2.4.2.2p2 and p3. I assume that the
> intent of this definition is to require that *all* normalized floating-point
> numbers from this model are elements of the floating-point type (otherwise 
> emin
> and emax would no longer make any sense). But with the current status, some
> floating-point numbers of exponent LDBL_MAX_EXP = 1024 (those with f54 = 1, if
> I'm not mistaken) are not elements of the type "long double". The reason is
> that the first double must be the exact value rounded to nearest (with the 
> even
> rounding rule).

That wording defines what "normalized" is, so that values with maximum 
exponent in this case don't count as normalized because not all values 
with that exponent in the floating-point model are representable, and 
defines LDBL_MAX so that it doesn't need to be normalized (and in this 
case isn't, by that definition of normalized).  The definition of 
LDBL_MAX_EXP is unchanged; it still just requires 2^(LDBL_MAX_EXP-1) to be 
representable, without requiring it to be normalized.

> > LDBL_MAX should 
> > be the largest value, which in this case is not normalized, and 
> > LDBL_MAX_EXP has nothing to do with what exponents do or not not have 
> > normalized values.
> 
> But the largest value must be at least the largest normalized value, which
> equals (1−b^(−p))b^emax. Currently, LDBL_MAX is smaller than this value.

That's not a normalized value, because not all values with exponent emax 
are representable in the type, so none of them are normalized.

[Bug c/61399] LDBL_MAX is incorrect with IBM long double format / overflow issues near large values

2017-06-20 Thread vincent-gcc at vinc17 dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61399

--- Comment #8 from Vincent Lefèvre  ---
It seems that the choice emin = -968, emax = 1024, p = 106 had been made
originally because all normalized values in this floating-point system (as
specified by 5.2.4.2.2p2) are representable as a sum of two double's (and even
two non-overlapping double's). However, the mistake is that not all pairs of
double's are valid in the type "long double" (as you can see with the LDBL_MAX
value). If only valid pairs are considered, then possible choices are:

* emin = -968, emax = 1024, p = 53 (but a poor choice as it would appear that
this is no better than the type double);

* emin = -968, emax = 1023, p = 106 (thus just decrease emax from the current
parameters);

* emin = -967, emax = 1023, p = 107 (I think that this would be a better
choice). The extra bit precision comes from the sign of the second double.

[Bug c/61399] LDBL_MAX is incorrect with IBM long double format / overflow issues near large values

2017-06-20 Thread vincent-gcc at vinc17 dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61399

--- Comment #7 from Vincent Lefèvre  ---
(In reply to jos...@codesourcery.com from comment #6)
> The current proposed wording for DR#467 
>  changes 
> "maximum representable finite floating-point number, [ math formula ]" to 
> "maximum representable finite floating-point number; if that value is 
> normalized, its value is [ math formula ],", which I think addresses the 
> first issue without requiring any change to LDBL_MAX_EXP;

This change concerns the LDBL_MAX definition. However, there is another issue,
related to the floating-point model 5.2.4.2.2p2 and p3. I assume that the
intent of this definition is to require that *all* normalized floating-point
numbers from this model are elements of the floating-point type (otherwise emin
and emax would no longer make any sense). But with the current status, some
floating-point numbers of exponent LDBL_MAX_EXP = 1024 (those with f54 = 1, if
I'm not mistaken) are not elements of the type "long double". The reason is
that the first double must be the exact value rounded to nearest (with the even
rounding rule).

> LDBL_MAX should 
> be the largest value, which in this case is not normalized, and 
> LDBL_MAX_EXP has nothing to do with what exponents do or not not have 
> normalized values.

But the largest value must be at least the largest normalized value, which
equals (1−b^(−p))b^emax. Currently, LDBL_MAX is smaller than this value.

> It's true that GCC cannot handle IBM long double constants with more than 
> 106 mantissa bits, including the largest representable finite value.

With the current LDBL_MAX_EXP definition, GCC can handle all values with 53
mantissa bits, but this becomes wrong for 54 mantissa bits.

[Bug c/61399] LDBL_MAX is incorrect with IBM long double format / overflow issues near large values

2017-06-19 Thread joseph at codesourcery dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61399

--- Comment #6 from joseph at codesourcery dot com  ---
The current proposed wording for DR#467 
 changes 
"maximum representable finite floating-point number, [ math formula ]" to 
"maximum representable finite floating-point number; if that value is 
normalized, its value is [ math formula ],", which I think addresses the 
first issue without requiring any change to LDBL_MAX_EXP; LDBL_MAX should 
be the largest value, which in this case is not normalized, and 
LDBL_MAX_EXP has nothing to do with what exponents do or not not have 
normalized values.  (The last proposed change would require LDBL_EPSILON 
to change, however.)

It's true that GCC cannot handle IBM long double constants with more than 
106 mantissa bits, including the largest representable finite value.

[Bug c/61399] LDBL_MAX is incorrect with IBM long double format / overflow issues near large values

2017-06-18 Thread vincent-gcc at vinc17 dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61399

Vincent Lefèvre  changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|INVALID |---

--- Comment #5 from Vincent Lefèvre  ---
I'm reopening this PR since it is actually not solved. There are two remaining
issues.

The first issue is no longer the formula, but the fact that LDBL_MAX is
strictly less than the maximum normalized number in the floating-point model. I
think that either LDBL_MAX_EXP should be reduced to 1023 (thus any
representable value with exponent 1024 would be regarded as unnormalized), or
the standard should be corrected in some other way, e.g. to allow an incomplete
binade for the largest exponent.

The second issue is that one can get a finite value
0x1.f7cp+1023 that is strictly larger than LDBL_MAX =
0x1.f78p+1023 (see the first and third numbers output
by my program given in comment 0). Thus LDBL_MAX is not the maximum
representable finite number as required by the standard.

[Bug c/61399] LDBL_MAX is incorrect with IBM long double format / overflow issues near large values

2016-11-17 Thread vincent-gcc at vinc17 dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61399

Vincent Lefèvre  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #4 from Vincent Lefèvre  ---
(In reply to Vincent Lefèvre from comment #0)
> By definition, for radix b = 2, LDBL_MAX is the value (1 - 2^(-p)) * 2^emax
> (see §5.2.4.2.2p12), which is the largest value representable in the
> floating-point model.

There has been a defect report and this is no longer the case. See:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2092.htm

"DR 467 decided that FLT_MAX, DBL_MAX, LDBL_MAX are the maximum representable
finite numbers for their respective types."

So, this solves this issue: LDBL_MAX is correct according to this DR. As a
consequence, I'm resolving the PR as invalid.

[Bug c/61399] LDBL_MAX is incorrect with IBM long double format / overflow issues near large values

2014-06-03 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61399

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org ---
I think lowering LDBL_MAX_EXP would be far worse than this.
The double-double format is unusable for any real numerics in so many ways that
this is just one small part of that, the variable precision from 106 to
thousands of precision bits depending on exact value is far worse.


[Bug c/61399] LDBL_MAX is incorrect with IBM long double format / overflow issues near large values

2014-06-03 Thread vincent-gcc at vinc17 dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61399

--- Comment #3 from Vincent Lefèvre vincent-gcc at vinc17 dot net ---
The variable precision is unavoidable with this format (this is even a feature,
despite the drawbacks). But the fact that the variable precision is problematic
by itself isn't a reason not to try to solve other issues.

[Bug c/61399] LDBL_MAX is incorrect with IBM long double format / overflow issues near large values

2014-06-02 Thread vincent-gcc at vinc17 dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61399

--- Comment #1 from Vincent Lefèvre vincent-gcc at vinc17 dot net ---
(In reply to Vincent Lefèvre from comment #0)
 One may choose to keep the behavior, i.e. consider that the high double is
 the value rounded to double precision, but this means that the
 floating-point model would need to be changed; otherwise some values are not
 representable, as shown above.

By the floating-point model would need to be changed, I mean, for instance,
choose LDBL_MAX_EXP = 1023. I think that this would be correct. A possible
drawback is that one would have LDBL_MAX_EXP  DBL_MAX_EXP, but I don't think
that this is a problem (note that one already has LDBL_MIN_EXP  DBL_MIN_EXP).
This would just mean that one has not normalized values outside the normal
range.