In other words, for my first question, what I want to know is how I might consistently and correctly get the same max score for any two pairs of identical documents without having to rewrite major parts of lucene. I could find ALL the scores and divide them by the max, but that seems somehow wrong and not robust, especially since if I put the identical documents several times into the index, I get slightly different scores from a MoreLikeThis query.
Yours, --Asad. Asad Sayeed/Watson/IBM @IBMUS To java-user@lucene.apache.org 07/14/2008 10:15 cc PM Subject Stable score scaling; LSI again Please respond to [EMAIL PROTECTED] apache.org Hi, I have a couple of questions about how to alter the similarity scores. I need scores that can be thresholded, and whose thresholds remain stable even when I add documents to the IndexWriter. ie, identity should be a fixed value such as 1.0. I know that for efficiency reasons, Lucene doesn't do this. However, that level of efficiency is not as big a concern for me as getting a stable, thresholdable similarity score from, eg, "normal" cosine similarity. Is there a way to change the DefaultSimilarity trivally to get this feature, or is it a major overhaul? The searches from Lucene are being fed to another analyzer is why, so when the "identity" score changes by adding docs to the index, it messes up the rest of the processing. The other question I had was about scoring via Latent Semantic Indexing. I read in the archives of this list from way back when that LSI was hard to integrate into Lucene. Is that still the case? I mean, from what I understand, it is just transforming the index in some way. Yours, --Asad. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]