Hi,
On Thu, May 30 2019, Tejas Joshi wrote:
> Hello.
> I tried to check the values for significand words using _Float128
> using a test program with value larger than 64 bit.
> Test program :
>
> int main ()
> {
> _Float128 x = 18446744073709551617.5; (i.e. 2^64 + 1.5 which is
> certainly longer than 64-bit)
> _Float128 y = __builtin_roundf128 (x);
> }
Interesting, I was also puzzled for a moment. But notice that:
int main ()
{
_Float128 x = 18446744073709551617.5f128;
_Float128 y = __builtin_roundf128 (x);
}
behaves as expected... the difference is of course the suffix pegged to
the literal constant (see
https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/Floating-Types.html).
I would also expect GCC to use a larger type if a constant does not fit
into a double, but apparently that does not happen. I would have to
check but it is probably the right behavior according to the standard.
>
> The lower words of significand (sig[1] and sig[0] for 64-bit system)
> are still being zero. I haven't included the roundevenf128 yet but
> inspecting this on real_round function.
I figured out what was going on when I realized that in your testcase,
sig[0] was equal to 0x8000000000000000 and so some precision has been
lost. Then it was easy to guess that it was because it was represented
in a narrower type.
Hope this helps,
Martin