RE: Tokenizer

Ard Schrijvers Mon, 30 Jul 2007 09:15:32 -0700

Hello,

> I have two questions.
> 
> First, Is there a tokenizer that takes every word and simply 
> makes a token
> out of it?


org.apache.lucene.analysis.WhitespaceTokenizer

> So it looks for two white spaces and takes the characters
> between them and makes a token out of them?
> 
> If this tokenizer exists, is there a difference between doing that and
> simply storing the field in the document with Field.Index = 
> UN_TOKENIZED?

Yes certainly it is different. UN_TOKENIZED is as it says, taking a String for 
example and put it "AS IS", as one single TERM in your index. For example you 
might want to do this when you want to sort on caption of a document, or title. 
TOKENIZED in combination with the 
org.apache.lucene.analysis.WhitespaceTokenizer tokenizes your string and 
indexes.

Regards Ard

> 
> --JP
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Tokenizer

Reply via email to