Re: URL to compare 2 Similarity's ready-- Re: Scoring benchmark evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

Doug Cutting Tue, 01 Feb 2005 10:01:10 -0800

David Spencer wrote:

+(f1:t1^2.0 t1) +(f1:t2^2.0 t2) f1:"t1 t2"~5^3.0 "t1 t2"~2^1.5
(f1:t1^2.0 t1) (f1:t2^2.0 t2) f1:"t1 t2"~5^3.0 "t1 t2"~2^1.5
(f1:t1^2.0 t1) (f1:t2^2.0 t2) (f1:t3^2.0 t3) (f1:t4^2.0 t4) (f1:t5^2.0 t5) f1:"t1 t2 t3 t4 t5"~5^3.0 "t1 t2 t3 t4 t5"~2^1.5

This looks great to me! I'd make mand=true by default, i.e., have a method where this parameter is not specified. Similarly, we might default phraseBoosts[i] to boolBoosts[i]*phraseBoost, and slops to infinity. What we want is something that provides only the knobs that we think most folks will need. Ideally we wouldn't even need to specify fieldBoosts. Short fields like titles get a larger lengthNorm, which effectively boosts them a lot already.

But perhaps we should back off and first just evaluate single field search with different idf, tf (and perhaps lengthNorm and sloppyFreq) definitions. Once we're happy with those, then we should return to different multi-field query formulations.

Let's start with the issue that's been raised so much: whether idf is better defined with log() or sqrt(log()).

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: URL to compare 2 Similarity's ready-- Re: Scoring benchmark evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

Reply via email to