[
https://issues.apache.org/jira/browse/LUCENE-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651452#action_12651452
]
Koji Sekiguchi commented on LUCENE-1286:
----------------------------------------
Thanks, Mark. I've tryed Ronnie's patch in LUCENE-644. It was great in speed,
but phrase support is needed in our project.
So, I'd like to know your approach mentioned in above description. Can you
elaborate this - "rebuild the document by running through the query terms by
using their offsets"?
> LargeDocHighlighter - another span highlighter optimized for large documents
> ----------------------------------------------------------------------------
>
> Key: LUCENE-1286
> URL: https://issues.apache.org/jira/browse/LUCENE-1286
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/highlighter
> Affects Versions: 2.4
> Reporter: Mark Miller
> Priority: Minor
>
> The existing Highlighter API is rich and well designed, but the approach
> taken is not very efficient for large documents.
> I believe that this is because the current Highlighter rebuilds the document
> by running through and scoring every every token in the tokenstream.
> With a break in the current API, an alternate approach can be taken: rebuild
> the document by running through the query terms by using their offsets. The
> benefit is clear - a large doc will have a large tokenstream, but a query
> will likely be very small in comparison.
> I expect this approach to be quite a bit faster for very large documents,
> while still supporting Phrase and Span queries.
> First rough patch to follow shortly.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]