On Wed, Feb 24, 2010 at 10:18:27PM +0200, Avi Rosenschein wrote: > On Wed, Feb 24, 2010 at 3:42 PM, Grant Ingersoll <gsing...@apache.org>wrote: > > > What would it be? > > > > For scoring to take into account the non-analyzed token stream. > > That is, if a field is analyzed (stemmed, lowercased, maybe even stop words > removed), that is fine for indexing. But tokens in the query matching the > original form could still get a higher score than those that only match when > analyzed.
You can get some of that effect by indexing stemmed and unstemmed forms, and letting IDF boost unstemmed results. (I picked this idea up from http://lingpipe-blog.com/2007/03/21/to-stem-or-not-to-stem/) > Also, this would maybe allow a flexible, run-time, decision of what > analyzers to include. For example, I might want stemming turned on for > normal search, but not for a PhraseQuery. That's harder - different field names for the different analyses might work, but not for run-time decisions. I think the way Sun's Minion does it is morphologically-based query expansion (see http://blogs.sun.com/searchguy/entry/lightweight_morphology_vs_stemming), and you might be able to implement that via query rewriting. Aaron Lav (a...@pobox.com) --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org