On Fri, 28 Jul 2023, Jakub Jelinek via Gcc-patches wrote: > I had a brief look at libbid and am totally unimpressed. > Seems we don't implement {,unsigned} __int128 <-> _Decimal{32,64,128} > conversions at all (we emit calls to __bid_* functions which don't exist),
That's bug 65833. > the library (or the way we configure it) doesn't care about exceptions nor > rounding mode (see following testcase) And this is related to the never-properly-resolved issue about the split of responsibility between libgcc, libdfp and glibc. Decimal floating point has its own rounding mode, set with fe_dec_setround and read with fe_dec_getround (so this test is incorrect). In some cases (e.g. Power), that's a hardware rounding mode. In others, it needs to be implemented in software as a TLS variable. In either case, it's part of the floating-point environment, so should be included in the state manipulated by functions using fenv_t or femode_t. Exceptions are shared with binary floating point. libbid in libgcc has its own TLS rounding mode and exceptions state, but the former isn't connected to fe_dec_setround / fe_dec_getround functions, while the latter isn't the right way to do things when there's hardware exceptions state. libdfp - https://github.com/libdfp/libdfp - is a separate library, not part of libgcc or glibc (and with its own range of correctness bugs) - maintained, but not very actively (maybe more so than the DFP support in GCC - we haven't had a listed DFP maintainer since 2019). It has various standard DFP library functions - maybe not the full C23 set, though some of the TS 18661-2 functions did get added, so it's not just the old TR 24732 set. That includes its own version of the libgcc support, which I think has some more support for using exceptions and rounding modes. It includes the fe_dec_getround and fe_dec_setround functions. It doesn't do anything to help with the issue of including the DFP rounding state in the state manipulated by functions such as fegetenv. Being a separate library probably in turn means that it's less likely to be used (although any code that uses DFP can probably readily enough choose to use a separate library if it wishes). And it introduces issues with linker command line ordering, if the user intends to use libdfp's copy of the functions but the linker processes -lgcc first. For full correctness, at least some functionality (such as the rounding modes and associated inclusion in fenv_t) would probably need to go in glibc. See https://sourceware.org/pipermail/libc-alpha/2019-September/106579.html for more discussion. But if you do put some things in glibc, maybe you still don't want the _BitInt conversions there? Rather, if you keep the _BitInt conversions in libgcc (even when the other support is in glibc), you'd have some libc-provided interface for libgcc code to get the DFP rounding mode from glibc in the case where it's handled in software, like some interfaces already present in the soft-float powerpc case to provide access to its floating-point state from libc (and something along the lines of sfp-machine.h could tell libgcc how to use either that interface or hardware instructions to access the rounding mode and exceptions as needed). > and for integral <-> _Decimal32 > conversions implement them as integral <-> _Decimal64 <-> _Decimal32 > conversions. While in the _Decimal32 -> _Decimal64 -> integral > direction that is probably ok, even if exceptions and rounding (other than > to nearest) were supported, the other direction I'm sure can suffer from > double rounding. Yes, double rounding would be an issue for converting 64-bit integers to _Decimal32 via _Decimal64 (it would be fine to convert 32-bit integers like that since they can be exactly represented in _Decimal64; it would be fine to convert 64-bit integers via _Decimal128). > So, wonder if it wouldn't be better to implement these in the soft-fp > infrastructure which at least has the exception and rounding mode support. > Unlike DPD, decoding BID seems to be about 2 simple tests of the 4 bits > below the sign bit and doing some shifts, so not something one needs a 10MB > of a library for. Now, sure, 5MB out of that are generated tables in Note that representations with too-large significand are defined to be noncanonical representations of zero, so you need to take care of that in decoding BID. > bid_binarydecimal.c, but unfortunately those are static and not in a form > which could be directly fed into multiplication (unless we'd want to go > through conversions to/from strings). > So, it seems to be easier to guess needed power of 10 from number of binary > digits or vice versa, have a small table of powers of 10 (say those which > fit into a limb) and construct larger powers of 10 by multiplicating those > several times, _Decimal128 has exponent up to 6144 which is ~ 2552 bytes > or 319 64-bit limbs, but having a table with all the 6144 powers of ten > would be just huge. In 64-bit limb fit power of ten until 10^19, so we > might need say < 32 multiplications to cover it all (but with the current > 575 bits limitation far less). Perhaps later on write a few selected powers > of 10 as _BitInt to decrease that number. You could e.g. have a table up to 10^(N-1) for some N, and 10^N, 10^2N etc. up to 10^6144 (or rather up to 10^6111, which can then be multiplied by a 34-digit integer significand), so that only one multiplication is needed to get the power of 10 and then a second multiplication by the significand. (Or split into three parts at the cost of an extra multiplication, or multiply the significand by 1, 10, 100, 1000 or 10000 as a multiplication within 128 bits and so only need to compute 10^k for k a multiple of 5, or any number of variations on those themes.) > > For conversion *from _BitInt to DFP*, the _BitInt value needs to be > > expressed in decimal. In the absence of optimized multiplication / > > division for _BitInt, it seems reasonable enough to do this naively > > (repeatedly dividing by a power of 10 that fits in one limb to determine > > base 10^N digits from the least significant end, for example), modulo > > detecting obvious overflow cases up front (if the absolute value is at > > Wouldn't it be cheaper to guess using the 10^3 ~= 2^10 approximation > and instead repeatedly multiply like in the other direction and then just > divide once with remainder? I don't know what's most efficient here, given that it's quadratic in the absence of optimized multiplication / division (so a choice between different approaches that take quadratic time). -- Joseph S. Myers jos...@codesourcery.com