On Fri, 28 Jul 2023, Jakub Jelinek via Gcc-patches wrote:

> I had a brief look at libbid and am totally unimpressed.
> Seems we don't implement {,unsigned} __int128 <-> _Decimal{32,64,128}
> conversions at all (we emit calls to __bid_* functions which don't exist),

That's bug 65833.

> the library (or the way we configure it) doesn't care about exceptions nor
> rounding mode (see following testcase)

And this is related to the never-properly-resolved issue about the split 
of responsibility between libgcc, libdfp and glibc.

Decimal floating point has its own rounding mode, set with fe_dec_setround 
and read with fe_dec_getround (so this test is incorrect).  In some cases 
(e.g. Power), that's a hardware rounding mode.  In others, it needs to be 
implemented in software as a TLS variable.  In either case, it's part of 
the floating-point environment, so should be included in the state 
manipulated by functions using fenv_t or femode_t.  Exceptions are shared 
with binary floating point.

libbid in libgcc has its own TLS rounding mode and exceptions state, but 
the former isn't connected to fe_dec_setround / fe_dec_getround functions, 
while the latter isn't the right way to do things when there's hardware 
exceptions state.

libdfp - https://github.com/libdfp/libdfp - is a separate library, not 
part of libgcc or glibc (and with its own range of correctness bugs) - 
maintained, but not very actively (maybe more so than the DFP support in 
GCC - we haven't had a listed DFP maintainer since 2019).  It has various 
standard DFP library functions - maybe not the full C23 set, though some 
of the TS 18661-2 functions did get added, so it's not just the old TR 
24732 set.  That includes its own version of the libgcc support, which I 
think has some more support for using exceptions and rounding modes.  It 
includes the fe_dec_getround and fe_dec_setround functions.  It doesn't do 
anything to help with the issue of including the DFP rounding state in the 
state manipulated by functions such as fegetenv.

Being a separate library probably in turn means that it's less likely to 
be used (although any code that uses DFP can probably readily enough 
choose to use a separate library if it wishes).  And it introduces issues 
with linker command line ordering, if the user intends to use libdfp's 
copy of the functions but the linker processes -lgcc first.

For full correctness, at least some functionality (such as the rounding 
modes and associated inclusion in fenv_t) would probably need to go in 
glibc.  See 
https://sourceware.org/pipermail/libc-alpha/2019-September/106579.html 
for more discussion.

But if you do put some things in glibc, maybe you still don't want the 
_BitInt conversions there?  Rather, if you keep the _BitInt conversions in 
libgcc (even when the other support is in glibc), you'd have some 
libc-provided interface for libgcc code to get the DFP rounding mode from 
glibc in the case where it's handled in software, like some interfaces 
already present in the soft-float powerpc case to provide access to its 
floating-point state from libc (and something along the lines of 
sfp-machine.h could tell libgcc how to use either that interface or 
hardware instructions to access the rounding mode and exceptions as 
needed).

> and for integral <-> _Decimal32
> conversions implement them as integral <-> _Decimal64 <-> _Decimal32
> conversions.  While in the _Decimal32 -> _Decimal64 -> integral
> direction that is probably ok, even if exceptions and rounding (other than
> to nearest) were supported, the other direction I'm sure can suffer from
> double rounding.

Yes, double rounding would be an issue for converting 64-bit integers to 
_Decimal32 via _Decimal64 (it would be fine to convert 32-bit integers 
like that since they can be exactly represented in _Decimal64; it would be 
fine to convert 64-bit integers via _Decimal128).

> So, wonder if it wouldn't be better to implement these in the soft-fp
> infrastructure which at least has the exception and rounding mode support.
> Unlike DPD, decoding BID seems to be about 2 simple tests of the 4 bits
> below the sign bit and doing some shifts, so not something one needs a 10MB
> of a library for.  Now, sure, 5MB out of that are generated tables in

Note that representations with too-large significand are defined to be 
noncanonical representations of zero, so you need to take care of that in 
decoding BID.

> bid_binarydecimal.c, but unfortunately those are static and not in a form
> which could be directly fed into multiplication (unless we'd want to go
> through conversions to/from strings).
> So, it seems to be easier to guess needed power of 10 from number of binary
> digits or vice versa, have a small table of powers of 10 (say those which
> fit into a limb) and construct larger powers of 10 by multiplicating those
> several times, _Decimal128 has exponent up to 6144 which is ~ 2552 bytes
> or 319 64-bit limbs, but having a table with all the 6144 powers of ten
> would be just huge.  In 64-bit limb fit power of ten until 10^19, so we
> might need say < 32 multiplications to cover it all (but with the current
> 575 bits limitation far less).  Perhaps later on write a few selected powers
> of 10 as _BitInt to decrease that number.

You could e.g. have a table up to 10^(N-1) for some N, and 10^N, 10^2N 
etc. up to 10^6144 (or rather up to 10^6111, which can then be multiplied 
by a 34-digit integer significand), so that only one multiplication is 
needed to get the power of 10 and then a second multiplication by the 
significand.  (Or split into three parts at the cost of an extra 
multiplication, or multiply the significand by 1, 10, 100, 1000 or 10000 
as a multiplication within 128 bits and so only need to compute 10^k for k 
a multiple of 5, or any number of variations on those themes.)

> > For conversion *from _BitInt to DFP*, the _BitInt value needs to be 
> > expressed in decimal.  In the absence of optimized multiplication / 
> > division for _BitInt, it seems reasonable enough to do this naively 
> > (repeatedly dividing by a power of 10 that fits in one limb to determine 
> > base 10^N digits from the least significant end, for example), modulo 
> > detecting obvious overflow cases up front (if the absolute value is at 
> 
> Wouldn't it be cheaper to guess using the 10^3 ~= 2^10 approximation
> and instead repeatedly multiply like in the other direction and then just
> divide once with remainder?

I don't know what's most efficient here, given that it's quadratic in the 
absence of optimized multiplication / division (so a choice between 
different approaches that take quadratic time).

-- 
Joseph S. Myers
jos...@codesourcery.com

Reply via email to