On Jan 8, 2008 11:48 PM, chris.b <[EMAIL PROTECTED]> wrote:
>
> Wrapping the whitespaceanalyzer with the ngramfilter it creates unigrams
> and
> the ngrams that i indicate, while maintining the whitespaces. :)
> The reason i'm doing this is because I only wish to index names with more
> than one token.
>
Then I am not sure I understand you.
Take this input text:
text by John Bear, old.
A WhiteSpaceAnalyzer would create these tokens:
text
by
John
Bear,
old.
An NgramFilter(2,2) wrapping it would create these tokens:
te
ex
xt
by
Jo
... etc.
You may use other limits but still no token would have a
white space in it.