[ 
https://issues.apache.org/jira/browse/LUCENE-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563305#comment-13563305
 ] 

Commit Tag Bot commented on LUCENE-1822:
----------------------------------------

[trunk commit] Koji Sekiguchi
http://svn.apache.org/viewvc?view=revision&revision=1438822

LUCENE-1822: add a note in Changes in runtime behavior

                
> FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too 
> naive
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-1822
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1822
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>    Affects Versions: 2.9
>         Environment: any
>            Reporter: Alex Vigdor
>            Assignee: Koji Sekiguchi
>            Priority: Minor
>             Fix For: 4.1, 5.0
>
>         Attachments: LUCENE-1822.patch, LUCENE-1822.patch, LUCENE-1822.patch, 
> LUCENE-1822-tests.patch
>
>
> The new FastVectorHighlighter performs extremely well, however I've found in 
> testing that the window of text chosen per fragment is often very poor, as it 
> is hard coded in SimpleFragListBuilder to always select starting 6 characters 
> to the left of the first phrase match in a fragment.  When selecting long 
> fragments, this often means that there is barely any context before the 
> highlighted word, and lots after; even worse, when highlighting a phrase at 
> the end of a short text the beginning is cut off, even though the entire 
> phrase would fit in the specified fragCharSize.  For example, highlighting 
> "Punishment" in "Crime and Punishment"  returns "e and <b>Punishment</b>" no 
> matter what fragCharSize is specified.  I am going to attach a patch that 
> improves the text window selection by recalculating the starting margin once 
> all phrases in the fragment have been identified - this way if a single word 
> is matched in a fragment, it will appear in the middle of the highlight, 
> instead of 6 characters from the beginning.  This way one can also guarantee 
> that the entirety of short texts are represented in a fragment by specifying 
> a large enough fragCharSize.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to