Re: svn commit: r332747 - in /lucene/java/trunk: ./ src/java/org/apache/lucene/search/regex/ src/test/org/apache/lucene/search/regex/

markharw00d Tue, 15 Nov 2005 14:08:23 -0800

I was thinking about the challenges of holding a score per documentrecently whilst trying to optimize the Lucene-embedded-in-Derby/HSQLDBcode.I found myself actually wanting to visualize the problem and to see thedistribution of scores for a query in a graphical form eg how sparse theresult sets were and the distribution of scores.

I ended up adding a panel to Luke which does exactly this. I didn't getany blinding insights but it may be of interest anyway.I've already supplied Andrzej this visualisation code and he is waitingfor Lucene 1.9 before releasing it as a part of an updated Luke.

Let me know if you want the code before then and I can mail it to you.



Cheers,
Mark


Doug Cutting wrote:

Yonik Seeley wrote:
Scoring recap... I think I've seen 4 different types of scoring
mentioned in this thread for a term expanding query on a single field:

1) query_boost
2) query_boost * (field_boost * lengthNorm)
3) query_boost * (field_boost * lengthNorm) * tf(t in q)
4) query_boost * (field_boost * lengthNorm) * tf(t in q) * idf(t in q)

1 & 2 can be done with ConstantScoreQuery
4 is currently done via rewrite to BooleanQuery and limiting the
number of terms.
3 is unimplemented AFAIK.
3 is easy to implement as a subcase of 4, no?
The challenge is to implement 3 or 4 efficiently for very largequeries w/o using gobs of RAM. One option is to keep a score perdocument, making the RAM use proportional to the size of thecollection (or at least the number of non-zero matches, if a sparserepresentation is used) or, as in 4, proportional to the number ofterms in the query (with a large constant--an i/o buffer).
Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

___________________________________________________________Yahoo! Model Search 2005 - Find the next catwalk superstars - http://uk.news.yahoo.com/hot/model-search/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: svn commit: r332747 - in /lucene/java/trunk: ./ src/java/org/apache/lucene/search/regex/ src/test/org/apache/lucene/search/regex/

Reply via email to