Yonik Seeley wrote:
Hmmm, very interesting idea.
Less than one decimal digit of precision might be hard to swallow when
you have to add scores together though:

smallfloat(score1) + smallfloat(score2) + smallfloat(score3)

Do you think that the 5/3 exponent/mantissa split is right for this,
or would a 4/4 be better?

The float epsilon should ideally be greater than the minimum score increment, and the float range should ideally be at least 100x greater than the maximum score increment, to permit boosting, large queries, etc.

Given a 100M document collection, the maximum idf is log(100M) = ~18, with a length-normalized tf of 1, for a max of 18. So the float range should ideally be around 1800 or greater.

The minimum idf is 1, and the minimum normalized tf with 10k word documents is 1/100. So the float epsilon should ideally be less than 1/100.

5 bits of mantissa and 3 bits of exponent is closest to this, but not quite there, with an epsilon of 1/32 and a range of up to ~1000.

Did I get the math right?

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to