Let's say for the query algorithm, the word algorith is also a match,
how do the highlighter know that it should also highlight
occurrences of the word algorith? (I am not sure it does this anyway)
The highlighter knows to highlight stemmed words because both the query
terms and the document content are fed through (hopefully) the same
analyzer so that "algorithmic", "algorithm", "algorithms" etc become
stemmed to the same root form in both query and doc content. The tokens
produced by analyzers include the byte offsets of the *original* full
word, not just the stemmed form, so the highlighter knows the full
extent of what to highlight in text.
Cheers
Mark
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]