Hi,

On Thu, May 30 2019, Tejas Joshi wrote:
> Hello.
> I tried to check the values for significand words using _Float128
> using a test program with value larger than 64 bit.
> Test program :
>
> int main ()
> {
>     _Float128 x = 18446744073709551617.5;   (i.e. 2^64 + 1.5 which is
> certainly longer than 64-bit)
>     _Float128 y = __builtin_roundf128 (x);
> }

Interesting, I was also puzzled for a moment.  But notice that:

int main ()
{
    _Float128 x = 18446744073709551617.5f128;
    _Float128 y = __builtin_roundf128 (x);
}

behaves as expected... the difference is of course the suffix pegged to
the literal constant (see
https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/Floating-Types.html).

I would also expect GCC to use a larger type if a constant does not fit
into a double, but apparently that does not happen.  I would have to
check but it is probably the right behavior according to the standard.

>
> The lower words of significand (sig[1] and sig[0] for 64-bit system)
> are still being zero. I haven't included the roundevenf128 yet but
> inspecting this on real_round function.

I figured out what was going on when I realized that in your testcase,
sig[0] was equal to 0x8000000000000000 and so some precision has been
lost.  Then it was easy to guess that it was because it was represented
in a narrower type.

Hope this helps,

Martin

Reply via email to