Hello, > I have two questions. > > First, Is there a tokenizer that takes every word and simply > makes a token > out of it?
org.apache.lucene.analysis.WhitespaceTokenizer > So it looks for two white spaces and takes the characters > between them and makes a token out of them? > > If this tokenizer exists, is there a difference between doing that and > simply storing the field in the document with Field.Index = > UN_TOKENIZED? Yes certainly it is different. UN_TOKENIZED is as it says, taking a String for example and put it "AS IS", as one single TERM in your index. For example you might want to do this when you want to sort on caption of a document, or title. TOKENIZED in combination with the org.apache.lucene.analysis.WhitespaceTokenizer tokenizes your string and indexes. Regards Ard > > --JP > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]