Re: URL to compare 2 Similarity's ready-- Re: Scoring benchmark evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

Doug Cutting Tue, 01 Feb 2005 11:20:01 -0800

David Spencer wrote:

Let's start with the issue that's been raised so much: whether idf is better defined with log() or sqrt(log()).
I can redo my page and rebuild indexes if necessary, I just need it clarified what we want to do, esp -> does the index need to be rebuilt?

The index needs to be rebuilt if Field.setBoost() or Document.setBoost() are used (which we're not doing) or if the Similarity.lengthNorm() implementation is changed (Chuck may have altered this). But when comparing tf and idf implementations the index need not be rebuilt.

I guess it's obvious from the above, but just to make it clear - I'll change the page to only do single field queries - but how many variations do we want to see in parallel - the current page shows 2x2 results, for each combo of index and query - but I, say, show several more queries in parallel w/ different weights...

For a start, let's look at idf=1/log(), idf=1/sqrt(log()), tf=sqrt() and tf=log(). In other words, the DefaultSimilarity definitions and Chuck's WikipediaSimilarity definitions.

We should also evaluate Chuck's lengthNorm() method. That will require two indexes (which you already have).

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: URL to compare 2 Similarity's ready-- Re: Scoring benchmark evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

Reply via email to