Re: Basic Named Entity Indexing

Doron Cohen Tue, 08 Jan 2008 14:14:40 -0800

On Jan 8, 2008 11:48 PM, chris.b <[EMAIL PROTECTED]> wrote:

>
> Wrapping the whitespaceanalyzer with the ngramfilter it creates unigrams
> and
> the ngrams that i indicate, while maintining the whitespaces. :)
> The reason i'm doing this is because I only wish to index names with more
> than one token.
>



Then I am not sure I understand you.
Take this input text:

     text by John Bear, old.

A WhiteSpaceAnalyzer would create these tokens:

   text
   by
   John
   Bear,
   old.

An NgramFilter(2,2) wrapping it would create these tokens:

   te
   ex
   xt
   by
   Jo
   ... etc.

You may use other limits but still no token would have a
white space in it.

Re: Basic Named Entity Indexing

Reply via email to