On 12/12/15 08:41, Richard Henderson wrote: > On 12/11/2015 03:38 PM, Chen Gang wrote: >> >> On 12/11/15 05:17, Richard Henderson wrote: >>> On 12/10/2015 06:15 AM, Chen Gang wrote: >>>> +#define TILEGX_F_MAN_HBIT (1ULL << 59) >>> ... >>>> +static uint64_t fr_to_man(float64 d) >>>> +{ >>>> + uint64_t val = get_f64_man(d) << 7; >>>> + >>>> + if (get_f64_exp(d)) { >>>> + val |= TILEGX_F_MAN_HBIT; >>>> + } >>>> + >>>> + return val; >>>> +} >>> >>> One presumes that "HBIT" is the ieee implicit one bit. >>> A better name or better comments would help there. >>> >> >> OK, thanks. And after think of again, I guess, the real hardware does >> not use HBIT internally (use the full 64 bits as mantissa without HBIT). > > It must do. Otherwise the arithmetic doesn't work out. >
Oh, yes, and we have to use my original implementation (60 for mantissa, 4 bits for other using). >> But what I have done is still OK (use 59 bits + 1 HBIT as mantissa), for >> 59 bits are enough for double mantissa (52 bits). It makes the overflow >> processing easier, but has to process mul operation specially. > > What you have works. But the mul operation isn't as special as you make it > out -- aside from requiring at least 104 bits as intermediate -- in that when > one implements what the hardware does, subtraction also may require > significant normalization. > I guess, you misunderstood what I said (my English is not quite well). For mul, at least, it needs (104 - 1) bits, At present, we have 120 bits for it (in fact, our mul generates 119 bits result). So it is enough. >> According to floatsidf, it seems "4", but after I expanded the bits, I >> guess, it is "7". >> >> /* >> * Double exp analyzing: (0x21b00 << 1) - 0x37(55) = 0x3ff >> * >> * 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 >> * >> * 1 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0 0 >> * >> * 0 0 0 0 0 1 1 0 1 1 1 => 0x37(55) >> * >> * 0 1 1 1 1 1 1 1 1 1 1 => 0x3ff >> * >> */ > > That's the exponent within the flags temporary. It has nothing to do with > the position of the extracted mantissa. > 0x37(55) + 4 (guard bits) + 1 (HBIT) = 60 bits. So, if the above is correct, the mantissa is 60 bits (with HBIT), and bit 18 in flags for overflow, bit 19 for underflow (bit 20 must be for sign). > FWIW, the minimum shift would be 3, in order to properly implement rounding; > if the hardware uses a shift of 4, that's fine too. > I guess, so it uses 4 guard bits. > What I would love to know is if the shift present in floatsidf is not really > required; equally valid to adjust 0x21b00 by 4. Meaning normalization would > do a proper job with the entire given mantissa. This would require better > documentation, or access to hardware to verify. > I guess, before call any fdouble insns, we can use the low 4 bits as mantissa (e.g. calc mul), but when call any fdouble insn, we can not use the lower 4 guard bits, so floatsidf has to shift 4 bits left. >>>> +uint64_t helper_fdouble_addsub(CPUTLGState >> And for my current implementation (I guess, it should be correct): >> >> typedef union TileGXFPDFmtV { >> struct { >> uint64_t mantissa : 60; /* mantissa */ >> uint64_t overflow : 1; /* carry/overflow bit for absolute >> add/mul */ >> uint64_t unknown1 : 3; /* unknown */ > > I personally like to call all 4 of the top bits overflow. But I have no idea > what the real hardware actually does. > >> In helper_fdouble_addsub(), both dest and srca are unpacked, so they are >> within 60 bits. So one time absolute add are within 61 bits, so let bit >> 61 as overflow bit is enough. > > True. But if all 4 top bits are considered overflow, then one could > implement floatdidf fairly easily. But I suspect that real hw doesn't work > that way, or it would have already been done. > So, I only assumed bit 60 is for overflow, the high 3 bits are unknown. For me, if one bit for overflow is enough, the hardware will save the other bits for another using (or are reserved for future). Thanks. -- Chen Gang (陈刚) Open, share, and attitude like air, water, and life which God blessed