Hi Yonik, Can we do the same for Lucene, the problem is combining the rewritten queries using the broken method in Query?
As far as I know, the problem is that e.g. MTQs rewrite *per searcher* so each searcher uses a different rewritten query (with different terms). So the scores are totally different even with a tf-idf patch (Fuzzy scores on MultiSearcher and Solr are totally wrong because each shard uses another rewritten query). To work around that, the Query class has a broken, broken, broken, broken, broken method to combine queries, which violates DeMorgans laws when there are e.g. negative clauses. And this method cannot be fixed to work with all queries (https://issues.apache.org/jira/browse/LUCENE-2756). ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: [email protected] > -----Original Message----- > From: [email protected] [mailto:[email protected]] On Behalf Of Yonik > Seeley > Sent: Monday, November 22, 2010 6:29 PM > To: [email protected] > Subject: Re: best practice: 1.4 billions documents > > On Mon, Nov 22, 2010 at 12:17 PM, Uwe Schindler <[email protected]> wrote: > > The latest discussion was more about MultiReader vs. MultiSearcher. > > > > But you are right, 1.4 B documents is not easy to go, especially when > > you index grows and you get to the 2.1 B marker, then no MultiSearcher > > or whatever helps. > > > > On the other hand even distributed Solr has the same problems like > > MultiSearcher: scoring MultiTermQueries (Fuzzy) doesn't work correctly > > Are you referring to the idf being local to the shard instead of global to the > whole colleciton? > Andrzej has a patch in the works, but it's not committed yet. > > > negative MTQ clauses may produce wrong results if the query rewriting > > is done like in MultiSearcher (which is unsolveable broken and broken > > and broken and again broken for some queries as Boolean clauses - see > > DeMorgan laws). > > I don't think this is a problem for Solr. Queries are executed on each shard as > normal (no difference from a non-distributed query). > > -Yonik > http://www.lucidimagination.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
