Sorry for the mispost -- fingers slipped...

Yes, but this part of the point.  Lucene is a field-based search engine
and its built-in support for taking simple queries and searching across
relevant fields is poor.  The fact that it requires all terms in all
fields is part of the problem.  Once that is addressed, another problem
is that Lucene does not provide a good mechanism to ensure diversity of
terms are matched across the distinct fields (in contrast to the same
term matching in multiple fields).  Once both of those problems are
addressed, it then becomes possible to use field boosting in concert
with tf*idf and length normalization to obtain better relevance ranking
for the simple types of queries that users most often enter (just words
and phrases -- no field specs or other syntactic sugar).

DistributingMultiFieldQueryParser, MaxDisjunctionQuery,
MaxDisjunctionScorer and WikipediaSimilarity work in concert to do all
of this.  I'd like to compare them first against the default approach in
Lucene (MultiFieldQueryParser, BooleanQuery and DefaultSimilarity).  If
they are not better against that, then there is little point going on.

If they are better against the default configuration, then it would make
sense to explore permutations, e.g., compare against an improved version
of MultiFieldQueryParser that still uses BooleanQuery (but with OR's)
and DefaultSimilarity.  Etc.

Make sense?

Chuck

  > -----Original Message-----
  > From: Daniel Naber [mailto:[EMAIL PROTECTED]
  > Sent: Friday, January 28, 2005 1:44 PM
  > To: Lucene Developers List
  > Subject: Re: Scoring benchmark evaluation. Was RE: How to proceed
with
  > Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?
  > 
  > On Friday 28 January 2005 17:53, Chuck Williams wrote:
  > 
  > > I think the baseline should use Lucene's MultiFieldQueryParser to
  > expand
  > > the query to search both title and body fields, as this is
presumably
  > > the current "out-of-the-box" solution.
  > 
  > Please remember that this is kind of buggy in Lucene 1.4: it will
  > rewrite
  > AND queries so that all terms are required in *all* fields, which is
not
  > what one wants for title/body searches.
  > 
  > Regards
  >  Daniel
  > 
  > --
  > http://www.danielnaber.de
  > 
  >
---------------------------------------------------------------------
  > To unsubscribe, e-mail: [EMAIL PROTECTED]
  > For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to