Re: Highlighting + phrase queries

Mark Miller Thu, 10 Jan 2008 08:18:45 -0800

I don't think you would see much of gain. Shoving the TokenStream intothe MemoryIndex is actually pretty fast and I wouldn't be surprised ifit was much faster than reading from disk. Most of the computationaltime is spent in reconstructing the TokenStream, whether you useterm-vectors or re-analyze. Also, if the Query does not have anyposition sensitive clauses, no MemoryIndex is created, so no worries there.

The great speed challenge of the current method (other than needing aTokenStream created) is that it runs over each Token and stitches thedocument together a piece at a time. This doesn't scale well on hugedocs. There are ways to cut this down and to just analyze the pertinentTokens as is done by a different patch. However, you'd need to haveTermVectors stored, and the concept doesn't fit with the currentHighlighter framework, which already has some significant functionalityand robustness.


- Mark

Marjan Celikik wrote:

Mark Miller wrote:
That is why the original contrib does not work with PhraseQuery's. Itsimply matches Tokens from the query with those in the TokenStream.LUCENE-794 takes the TokenStream and shoves it into a MemoryIndex.Then, after converting the query to a SpanQuery approximation,getSpans is called on the index for the query. The Spans provide abound on what positions should be Highlighted. Everything else isdone exactly like the original Highlighter (This is a patch that fitsinto the original Highlighter framework that was developed, therebyretaining all of its richness :) ).
Mark, thanks for your patience! I have one final (conceptual,high-level) question concerning the usage of the MemoryIndex indexover the TokenStream. Is it a good idea tostore the procomputed MemoryIndex (conceptually speaking) as a fieldinto each document at indexing time and then just load thisprecomputed index fromdisk (as you do with TermVector) such that you save extra computationfor the highlighting?
Marjan.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Highlighting + phrase queries

Reply via email to