Re: svn commit: r332747 - in /lucene/java/trunk: ./ src/java/org/apache/lucene/search/regex/ src/test/org/apache/lucene/search/regex/

Doug Cutting Wed, 16 Nov 2005 12:33:21 -0800

Yonik Seeley wrote:

Hmmm, very interesting idea.
Less than one decimal digit of precision might be hard to swallow when
you have to add scores together though:


smallfloat(score1) + smallfloat(score2) + smallfloat(score3)

Do you think that the 5/3 exponent/mantissa split is right for this,
or would a 4/4 be better?

The float epsilon should ideally be greater than the minimum scoreincrement, and the float range should ideally be at least 100x greaterthan the maximum score increment, to permit boosting, large queries, etc.

Given a 100M document collection, the maximum idf is log(100M) = ~18,with a length-normalized tf of 1, for a max of 18. So the float rangeshould ideally be around 1800 or greater.

The minimum idf is 1, and the minimum normalized tf with 10k worddocuments is 1/100. So the float epsilon should ideally be less than 1/100.

5 bits of mantissa and 3 bits of exponent is closest to this, but notquite there, with an epsilon of 1/32 and a range of up to ~1000.


Did I get the math right?

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: svn commit: r332747 - in /lucene/java/trunk: ./ src/java/org/apache/lucene/search/regex/ src/test/org/apache/lucene/search/regex/

Reply via email to