[EMAIL PROTECTED] wrote:
Hi!

I do build a list of all unique words in all my docs from WhitespaceAnalyzer.tokenStream(). I also do index all my docs using a GermanAnalyzer in another index. There are plenty of word in the word list that don't return any hits when searching the doc index built using the GermanAnalyzer - and these are no stop words.

Why is this?


Analyzers preprocess the text to be indexed; different Analyzers will generate different text-tokens that are indexed. only you can know which Analyzer fits your needs, but you need to apply this one consistently for indexing, searching and generating lists of unique words, if you want to get expectable results.


markus


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to