the problem with whitespaceanalyzer is that if you have for example a sentence in the text say "lucene is indexing." a query for "indexing" will produce no hits because "." is not a token delimiter. you will have to search for "indexing*".
for me the solution was to write my own tokenizer/analyzer pair --snip and --snip public final class myTokenizer extends CharTokenizer { /** Construct a new LowerCaseTokenizer. */ public myTokenizer(Reader in) { super(in); } /** Collects only characters which satisfy * {@link Character#isLetter(char)}.*/ protected char normalize(char c) { return Character.toLowerCase(c); } /** Collects only characters which do not satisfy * {@link Character#isWhitespace(char)}.*/ protected boolean isTokenChar(char c) { return Character.isLetterOrDigit(c); } } public final class myAnalyzer extends Analyzer { public final TokenStream tokenStream(String fieldName, Reader reader) { return new myTokenizer(reader); } } --snip regards joe "RAYMOND Romain" <[EMAIL PROTECTED]> writes on Tue, 26 Mar 2002 08:53:51 +0100 (MET): > hello, > > The solution we adopted is to use WhiteSpaceAnalyser. > If you print the result of a query after parsing it (with parse > method) > the tokenizers used delete the numbers from the query. > But WhiteSpaceAnalyser only tokenizes based on ... spaces, so we can > search on numbers values .... > > -- > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>