Marjan Celikik wrote:
Mark Miller wrote:
The Highlighter works by comparing the TokenStream of the document
with the Tokens in the query. The TokenStream can be rebuilt from the
index if you use TermVectors with TokenSources or you can get it by
reanalyzing the document. Each Token from the TokenStream is checked
against Tokens in the query, and if there is a match you have a
Highlight. The original text is then reconstructed with the
Highlights from info in the TokenStream about original offsets into
the document for each Token. Also, there is a Fragment system that
will break apart the Highlighted text into score sorted text Fragments.
OK, this is what I already knew...
That is why the original contrib does not work with PhraseQuery's. It
simply matches Tokens from the query with those in the TokenStream.
LUCENE-794 takes the TokenStream and shoves it into a MemoryIndex.
Then, after converting the query to a SpanQuery approximation,
getSpans is called on the index for the query. The Spans provide a
bound on what positions should be Highlighted. Everything else is
done exactly like the original Highlighter (This is a patch that fits
into the original Highlighter framework that was developed, thereby
retaining all of its richness :) ).
Thanks! This is what I needed. Still I don't know how to obtain the
source code of your patch :(
ok, I will just apply the patch. Sorry.
Marjan.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]