--- On Sat, 9/12/09, Paul Taylor <paul_t...@fastmail.fm> wrote:

> From: Paul Taylor <paul_t...@fastmail.fm>
> Subject: Filter before tokenize ?
> To: java-user@lucene.apache.org
> Date: Saturday, September 12, 2009, 9:39 PM
> Is it possible to filter before
> tokenize, or is that not a good idea.
> I want to convert '&' to 'and' , so they are dealt with
> the same way, but the StandardTokenizer I am using removes
> the &, I could change the tokenizer but  because
> I'm not too clear on jflex syntax it would seem easier to
> just apply a CharFilter before tokenizing, but is that
> possible

May be you can use WhitespaceTokenizer that won't remove &?
Why and's (&) are import for you? Do you need to search them?
Replacing &'s before indexing (by preprocessing) can be a option?


Filter before tokenizer can be simulated by using:

1-)KeywordTokenizer 
2-)Your CharFilter
3-)A token filter that tokenizes input token's text using StandardTokenizer

But i think this is not a good idea.

Hope this helps.




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to