Daniel, I haven't yet dealt with multiple indices, but will in the not-too-distant future, so this sounds like a problem that will also be important to me. I just briefly read through the relevant code (e.g., MultiSearcher) to try to understand the issue. My guess is the problem arises from the fact that the separate indices have separately computed their tf's and idf's. This would imply that the searches against each index are completely separate searches. Since the current scoring does not produce scores that are comparable across separate searches, the resorting of the hits in MultiSearcher.search() via the HitQueue would not accomplish its intended effect. This would lead to an incorrect final ranking. Is that the problem you are actually seeing? If I've got it right, then yes, I believe what I'm proposing will fix this too since it would make the scores coming back from the searches against the separate indices directly comparable, causing the interleaving in MultiSearcher.search() to work properly.
However, I'm not sure this analysis is completely correct due to MultiSearcher.docFreq() which appears to be trying to redefine the tf's to be the global value across all indices. It wasn't clear to me how this code is ever reached, e.g. from TermQuery --> SegmentTermDocs. If the tf's and idf's are in fact computed globally, then the interleaving should work as it is, thus I'm guessing they are not. This raises the question of the desired semantics. Computing the tf's and idf's globally seems right for apps that use multiple indices strictly for scalability reasons, while issuing separate searches with properly-comparable but separate scoring on each seems right for meta-search. If the scalability case isn't working right (i.e., if MultiSeacher is not computing the tf's and idf's across the entire collection of indices), fixing it would require a different approach than what I've proposed. If I've missed the actual problem entirely, please let me know. Thanks, Chuck > -----Original Message----- > From: Daniel Naber [mailto:[EMAIL PROTECTED] > Sent: Thursday, October 21, 2004 11:33 AM > To: Lucene Developers List > Subject: Re: Normalized Scoring -- was RE: idf and explain(), was Re: > Search and Scoring > > On Thursday 21 October 2004 20:00, Chuck Williams wrote: > > > Thanks Otis. Other than trying to get some consensus a) that this is > a > > problem worth fixing, and b) on the best approach to fix it, my > central > > question is, if I fix it is it likely to get incorporated back into > > Lucene? > > Chuck, > > sorry, I also lack the time and knowledge to follow this discussion, but > what I consider a problem is that you currently cannot search over > several > indices without getting an incorrect ranking (except these indices were > built from splitting one large index). Is that also something you're > trying to solve? > > Regards > Daniel > > -- > http://www.danielnaber.de > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]