Index time boosts don't have much granularity, so you would run out of
values pretty quickly, unless I am misunderstanding your proposal.
From Similarity.encodeNorm:
/** Encodes a normalization factor for storage in an index.
*
* <p>The encoding uses a three-bit mantissa, a five-bit exponent,
and
* the zero-exponent point at 15, thus
* representing values from around 7x10^9 to 2x10^-9 with about one
* significant decimal digit of accuracy. Zero is also represented.
* Negative numbers are rounded up to zero. Values too large to
represent
* are rounded down to the largest representable value. Positive
values too
* small to represent are rounded up to the smallest positive
representable
* value.
*
* @see org.apache.lucene.document.Field#setBoost(float)
* @see org.apache.lucene.util.SmallFloat
*/
public static byte encodeNorm(float f) {
On May 14, 2008, at 11:43 AM, Erick Erickson wrote:
Don't ask me why this occurred to me, since I'm working on a
completely different project... Mostly, this is intended to have
folks who really understand the scoring algorithms chime in and
tell me it's a silly idea <G>.
We've seen multiple threads asking the question: "How can I
cause more-recent documents to be scored higher?" and
several suggestions have been put forward.
What would happen if you had a "date factor" that you persisted
that was the *index-time* boost you applied to documents and
you kept increasing this factor every time period? Or boosted
each document by some factor based on the relevant date?
For instance, let's say I was indexing e-mails starting today.
All e-mails indexed today would get a boost (for all fields?) of 1.0.
Tomorrow, the boost would be 1.1, and the next day 1.2 etc.
Now, any search would automatically push more recent documents
toward the top. The operative word here is "tend" since it wouldn't
have the problem of sorting on dates, which ignores scores.....
I chose 1.0, 1.1, 1.2 at random, but you get the idea.
My main concern is that sometime you would have *very* large
factor differences and I don't know if you'd *ever* see really old
documents, but that's a danger no matter what you do. And
since I'm not even working on a lucene project now, I don't have
the time to try it <G>. Can you recognize a plea for having
others do the hard work when you see it?
And who knows, I may just be parroting something already suggested,
which means that it took this long to actually sink in...
Best
Erick
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]