Right. And the typical answer to that is: - If your terms are roughly equally distributed in all N indices (e.g. random doc->index/shard assignment), the relevance score will roughly match.
- If you have business rules for doc->index/shard distribution, then your relevance scores will not be comparable. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Toke Eskildsen <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Friday, May 2, 2008 12:13:04 AM > Subject: Re: Does Lucene Supports Billions of data > > From: John Wang > [...] > > sub index 1: 1 billion docs > > sub index 2: 1 billion docs > > sub index 3: 1 billion docs > > > > federating search to these subindexes, you represent an index of 3 > > billiondocs, and all internal doc ids are of type int. > > That falls under Daniel's "...unless you wrap your own framework around it". > The > problem with the solution you're describing is that it's not functionally > equivalent to a single index of 3 billion docs. > > If you just create 3 independent indexes and merge the top hits from all 3, > the > ranking of the documents will be messed up. You'll need to make sure that the > scores from the different indexes can be compared. That's tricky when the > score > depends on the frequency of the terms in the whole corpus. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]