Sorry for the mispost -- fingers slipped... Yes, but this part of the point. Lucene is a field-based search engine and its built-in support for taking simple queries and searching across relevant fields is poor. The fact that it requires all terms in all fields is part of the problem. Once that is addressed, another problem is that Lucene does not provide a good mechanism to ensure diversity of terms are matched across the distinct fields (in contrast to the same term matching in multiple fields). Once both of those problems are addressed, it then becomes possible to use field boosting in concert with tf*idf and length normalization to obtain better relevance ranking for the simple types of queries that users most often enter (just words and phrases -- no field specs or other syntactic sugar).
DistributingMultiFieldQueryParser, MaxDisjunctionQuery, MaxDisjunctionScorer and WikipediaSimilarity work in concert to do all of this. I'd like to compare them first against the default approach in Lucene (MultiFieldQueryParser, BooleanQuery and DefaultSimilarity). If they are not better against that, then there is little point going on. If they are better against the default configuration, then it would make sense to explore permutations, e.g., compare against an improved version of MultiFieldQueryParser that still uses BooleanQuery (but with OR's) and DefaultSimilarity. Etc. Make sense? Chuck > -----Original Message----- > From: Daniel Naber [mailto:[EMAIL PROTECTED] > Sent: Friday, January 28, 2005 1:44 PM > To: Lucene Developers List > Subject: Re: Scoring benchmark evaluation. Was RE: How to proceed with > Bug 31841 - MultiSearcher problems with Similarity.docFreq() ? > > On Friday 28 January 2005 17:53, Chuck Williams wrote: > > > I think the baseline should use Lucene's MultiFieldQueryParser to > expand > > the query to search both title and body fields, as this is presumably > > the current "out-of-the-box" solution. > > Please remember that this is kind of buggy in Lucene 1.4: it will > rewrite > AND queries so that all terms are required in *all* fields, which is not > what one wants for title/body searches. > > Regards > Daniel > > -- > http://www.danielnaber.de > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]