[
https://issues.apache.org/jira/browse/LUCENE-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646177#action_12646177
]
Koji Sekiguchi commented on LUCENE-1286:
----------------------------------------
bq. First rough patch to follow shortly.
Mark,
I'm very interested in this. How is it going on?
> LargeDocHighlighter - another span highlighter optimized for large documents
> ----------------------------------------------------------------------------
>
> Key: LUCENE-1286
> URL: https://issues.apache.org/jira/browse/LUCENE-1286
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/highlighter
> Affects Versions: 2.4
> Reporter: Mark Miller
> Priority: Minor
>
> The existing Highlighter API is rich and well designed, but the approach
> taken is not very efficient for large documents.
> I believe that this is because the current Highlighter rebuilds the document
> by running through and scoring every every token in the tokenstream.
> With a break in the current API, an alternate approach can be taken: rebuild
> the document by running through the query terms by using their offsets. The
> benefit is clear - a large doc will have a large tokenstream, but a query
> will likely be very small in comparison.
> I expect this approach to be quite a bit faster for very large documents,
> while still supporting Phrase and Span queries.
> First rough patch to follow shortly.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]