boosts for unstemmed matches (was Re: If you could have one feature in Lucene...)

Aaron Lav Wed, 24 Feb 2010 13:21:27 -0800

On Wed, Feb 24, 2010 at 10:18:27PM +0200, Avi Rosenschein wrote:
> On Wed, Feb 24, 2010 at 3:42 PM, Grant Ingersoll <gsing...@apache.org>wrote:
> 
> > What would it be?
> >
> 
> For scoring to take into account the non-analyzed token stream.
> 
> That is, if a field is analyzed (stemmed, lowercased, maybe even stop words
> removed), that is fine for indexing. But tokens in the query matching the
> original form could still get a higher score than those that only match when
> analyzed.


You can get some of that effect by indexing stemmed and unstemmed
forms, and letting IDF boost unstemmed results.  (I picked this
idea up from http://lingpipe-blog.com/2007/03/21/to-stem-or-not-to-stem/)

> Also, this would maybe allow a flexible, run-time, decision of what
> analyzers to include. For example, I might want stemming turned on for
> normal search, but not for a PhraseQuery.

That's harder - different field names for the different analyses might
work, but not for run-time decisions.  I think the way Sun's Minion does
it is morphologically-based query expansion (see 
http://blogs.sun.com/searchguy/entry/lightweight_morphology_vs_stemming), and 
you might be able to
implement that via query rewriting.

     Aaron Lav (a...@pobox.com)

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

boosts for unstemmed matches (was Re: If you could have one feature in Lucene...)

Reply via email to