Richard Henderson <richard.hender...@linaro.org> writes:
> Hi Alex, > > Here's my first adjustment to your conversion for 128-bit floats. > > The Idea is to use a set of macros and an include file so that we > can re-use the same large chunk of code that performs the basic > operations on various fraction lengths. It's ugly, but without > proper language support it seems to be less ugly than most. > > While I've just gone and added lots of stuff to int128... I have > had another idea, half-baked because I'm tired and it's late: > > typedef struct { > FloatClass cls; > int exp; > bool sign; > uint64_t frac[]; > } FloatPartsBase; > > typedef struct { > FloatPartsBase base; > uint64_t frac; > } FloatParts64; > > typedef struct { > FloatPartsBase base; > uint64_t frac_hi, frac_lo; > } FloatParts128; > > typedef struct { > FloatPartsBase base; > uint64_t frac[4]; /* big endian word ordering */ > } FloatParts256; > > This layout, with the big-endian ordering, means that storage > can be shared between them, just by ignoring the least significant > words of the fraction as needed. Which may make muladd more > understandable. Would the big-endian formatting hamper the compiler on x86 where it can do extra wide operations? I am still seeing a multi MFlop drop in performance when converting the float128_addsub to the new code. If this allows the compiler to do better on the code I can live with it. > > E.g. > > static void muladd_floats64(FloatParts128 *r, FloatParts64 *a, > FloatParts64 *b, FloatParts128 *c, ...) > { > // handle nans > // produce 128-bit product into r > // handle p vs c special cases. > // zero-extend c to 128-bits > c->frac[1] = 0; > // perform 128-bit fractional addition > addsub_floats128(r, c, ...); > // fold 128-bit fraction to 64-bit sticky bit. > r->frac[0] |= r->frac[1] != 0; > } > > float64 float64_muladd(float64 a, float64 b, float64 c, ...) > { > FloatParts64 pa, pb; > FloatParts128 pc, pr; > > float64_unpack_canonical(&pa.base, a, status); > float64_unpack_canonical(&pb.base, b, status); > float64_unpack_canonical(&pc.base, c, status); > muladd_floats64(&pr, &pa, &pb, &pc, flags, status); > > return float64_round_pack_canonical(&pr.base, status); > } > > Similarly, muladd_floats128 would use addsub_floats256. > > However, the big-endian word ordering means that Int128 > cannot be used directly; so a set of wrappers are needed. > If added the Int128 routine just for use here, then it's > probably easier to bypass Int128 and just code it here. Are you talking about all our operations? Will we still need to#ifdef CONFIG_INT128 in the softfloat code? > > Thoughts? > > > r~ > > > Richard Henderson (15): > qemu/int128: Add int128_or > qemu/int128: Add int128_clz, int128_ctz > qemu/int128: Rename int128_rshift, int128_lshift > qemu/int128: Add int128_shr > qemu/int128: Add int128_geu > softfloat: Use mulu64 for mul64To128 > softfloat: Use int128.h for some operations > softfloat: Tidy a * b + inf return > softfloat: Add float_cmask and constants > softfloat: Inline float_raise > Test split to softfloat-parts.c.inc > softfloat: Streamline FloatFmt > Test float128_addsub > softfloat: Use float_cmask for addsub_floats > softfloat: Improve subtraction of equal exponent > > include/fpu/softfloat-macros.h | 89 ++-- > include/fpu/softfloat.h | 5 +- > include/qemu/int128.h | 61 ++- > fpu/softfloat.c | 802 ++++++++++----------------------- > softmmu/physmem.c | 4 +- > target/ppc/int_helper.c | 4 +- > tests/test-int128.c | 44 +- > fpu/softfloat-parts.c.inc | 339 ++++++++++++++ > fpu/softfloat-specialize.c.inc | 45 +- > 9 files changed, 716 insertions(+), 677 deletions(-) > create mode 100644 fpu/softfloat-parts.c.inc -- Alex Bennée