Re: Highlighting + phrase queries

Mark Miller Thu, 10 Jan 2008 06:28:46 -0800

The Highlighter works by comparing the TokenStream of the document withthe Tokens in the query. The TokenStream can be rebuilt from the indexif you use TermVectors with TokenSources or you can get it byreanalyzing the document. Each Token from the TokenStream is checkedagainst Tokens in the query, and if there is a match you have aHighlight. The original text is then reconstructed with the Highlightsfrom info in the TokenStream about original offsets into the documentfor each Token. Also, there is a Fragment system that will break apartthe Highlighted text into score sorted text Fragments.

That is why the original contrib does not work with PhraseQuery's. Itsimply matches Tokens from the query with those in the TokenStream.LUCENE-794 takes the TokenStream and shoves it into a MemoryIndex. Then,after converting the query to a SpanQuery approximation, getSpans iscalled on the index for the query. The Spans provide a bound on whatpositions should be Highlighted. Everything else is done exactly likethe original Highlighter (This is a patch that fits into the originalHighlighter framework that was developed, thereby retaining all of itsrichness :) ).


Marjan Celikik wrote:

Mark Miller wrote:
Oh yeah...something that you may not have seen is that this has adependency on MemoryIndex from contrib. You need that jar as well.
- Mark
Hm, I need the source code. How do I download the files fromhttps://issues.apache.org/jira/browse/LUCENE-794 (all I see are some.patch files)?
What I really need is a how the highlighter works in a nutshell... Iam working on a publication and I want to have a reference to Luceneand its highlighting...
Thanks again.

Marjan.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Highlighting + phrase queries

Reply via email to