The way I've always done this was to index two fields: say, "contents"
and "contents_unstemmed",  (using a PerFieldAnalyzer) and then query
on both of them.  This has the double effect of a) boosting unstemmed
hits, because every unstemmed match is also a stemmed one, so the
BooleanQuery combining the stemmed and unstemmed queries gets higher
weight in this case; and b) it allows you to query by *only* the
unstemmed variant if e.g. the user puts their search term in quotes,
indicating they really want an exact match.

  -jake



On 2/11/08, Michael Stoppelman <[EMAIL PROTECTED]> wrote:
> Hi all,
> I've got an index with tokens that are stemmed. Sometimes I really need to
> boost the unstemmed
> version of a query word to get the most relevant documents.
>
> Example:
> Query: [olives].
>
> I don't want to match documents with the words: oliver, oliver's, etc...
>
> Since I'm stemming when creating the index is there a way to store both
> versions (stemmed/unstemmed) with
> setIncrementPosition()? Is that the correct way to deal with this? I was
> reading old archives and this didn't seem
> to be a great way decision since it breaks PhraseQuery [1].
>
> It seems like it would be useful if at query scoring time if I could see the
> original string values of the tokens in this case
> at least.
>
> Thanks in advance,
>
> -M
>
> [1] http://www.mail-archive.com/[EMAIL PROTECTED]/msg07416.html
>

-- 
Sent from Gmail for mobile | mobile.google.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to