Re: Stemming and Wildcard Queries

Herbert Roitblat Thu, 20 May 2010 13:48:32 -0700

At a general level, we have found that stemming during indexing is notadvisable. Sometimes users want the exact form and if you have removed theexact form during indexing, obviously, you cannot provide that. Rather, wehave found that stemming during search is more useful, or maybe it should becalled anti-stemming. For any given input for which the user wants to stem,we could derive the variations during the query processing. E.g., plan canbe expanded to include plans, planning, planned, etc.

In our application we provide a feature that is sometimes called a wordwheel. When someone enters plan in this tool, we show all of the words inthe index that start with plan. Here are some of the related words:

plan
plane
planes
planet
planificaci
planned
plannedoutages.xls
planner
planners


Just a thought.
Herb

----- Original Message -----From: "Ivan Provalov" <iprov...@yahoo.com>

To: <java-user@lucene.apache.org>
Sent: Thursday, May 20, 2010 1:16 PM
Subject: Stemming and Wildcard Queries

Is there a good way to combine the wildcard queries and stemming?
As is, the field which is stemmed at index time, won't work with somewildcard queries.
We were thinking to create two separate index fields - one stemmed, onenon-stemmed, but we are having issues with our SpanNear queries (theyrequire the same field).
We thought to try combining the stemmed and non-stemmed terms in the samefield, but we are concerned about the stats being skewed as a result ofthis (especially for the TermVector stats). Can overloading thenon-stemmed field with stemmed terms cause any issues with the TermVector?
Any suggestions?

Ivan Provalov




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Stemming and Wildcard Queries

Reply via email to