[ 
https://issues.apache.org/jira/browse/LUCENE-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557800#action_12557800
 ] 

Grant Ingersoll commented on LUCENE-644:
----------------------------------------

Is this still an issue?  Does this speedup still apply?

> Contrib: another highlighter approach
> -------------------------------------
>
>                 Key: LUCENE-644
>                 URL: https://issues.apache.org/jira/browse/LUCENE-644
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Ronnie Kolehmainen
>            Priority: Minor
>         Attachments: FulltextHighlighter.java, FulltextHighlighter.java, 
> FulltextHighlighterTest.java, FulltextHighlighterTest.java, svn-diff.patch, 
> svn-diff.patch, TokenSources.java, TokenSources.java.diff
>
>
> Mark Harwoods highlighter package is a great contribution to Lucene, I've 
> used it a lot! However, when you have *large* documents (fields), 
> highlighting can be quite time consuming if you increase the number of bytes 
> to analyze with setMaxDocBytesToAnalyze(int). The default value of 50k is 
> often too low for indexed PDFs etcetera, which results in empty highlight 
> strings.
> This is an alternative approach using term position vectors only to build 
> fragment info objects. Then a StringReader can read the relevant fragments 
> and skip() between them. This is a lot faster. Also, this method uses the 
> *entire* field for finding the best fragments so you're always guaranteed to 
> get a highlight snippet.
> Because this method only works with fields which have term positions stored 
> one can check if this method works for a particular field using following 
> code (taken from TokenSources.java):
>         TermFreqVector tfv = (TermFreqVector) reader.getTermFreqVector(docId, 
> field);
>         if (tfv != null && tfv instanceof TermPositionVector)
>         {
>           // use FulltextHighlighter
>         }
>         else
>         {
>           // use standard Highlighter
>         }
> Someone else might find this useful so I'm posting the code here.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to