I don't think you would see much of gain. Shoving the TokenStream into the MemoryIndex is actually pretty fast and I wouldn't be surprised if it was much faster than reading from disk. Most of the computational time is spent in reconstructing the TokenStream, whether you use term-vectors or re-analyze. Also, if the Query does not have any position sensitive clauses, no MemoryIndex is created, so no worries there.

The great speed challenge of the current method (other than needing a TokenStream created) is that it runs over each Token and stitches the document together a piece at a time. This doesn't scale well on huge docs. There are ways to cut this down and to just analyze the pertinent Tokens as is done by a different patch. However, you'd need to have TermVectors stored, and the concept doesn't fit with the current Highlighter framework, which already has some significant functionality and robustness.

- Mark

Marjan Celikik wrote:
Mark Miller wrote:

That is why the original contrib does not work with PhraseQuery's. It simply matches Tokens from the query with those in the TokenStream. LUCENE-794 takes the TokenStream and shoves it into a MemoryIndex. Then, after converting the query to a SpanQuery approximation, getSpans is called on the index for the query. The Spans provide a bound on what positions should be Highlighted. Everything else is done exactly like the original Highlighter (This is a patch that fits into the original Highlighter framework that was developed, thereby retaining all of its richness :) ).


Mark, thanks for your patience! I have one final (conceptual, high-level) question concerning the usage of the MemoryIndex index over the TokenStream. Is it a good idea to store the procomputed MemoryIndex (conceptually speaking) as a field into each document at indexing time and then just load this precomputed index from disk (as you do with TermVector) such that you save extra computation for the highlighting?

Marjan.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to