[ 
https://issues.apache.org/jira/browse/LUCENE-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shinya Kasatani updated LUCENE-2910:
------------------------------------

    Description: 
When you use the Highlighter combined with N-Gram tokenizers such as 
CJKTokenizer and try to highlight the phrase that appears around 50th term in 
the field, the highlighted phrase is shorter than expected.

{noformat}
e.g. Highlighting "fooo" in the following text with bigram tokenizer:
"0---------1---------2---------3---------4---------fooo---"

Expected: "0---------1---------2---------3---------4---------<B>fooo</B>---"
Actual: "0---------1---------2---------3---------4---------f<B>ooo</B>---"
{noformat}

  was:
When you use the Highlighter combined with N-Gram tokenizers such as 
CJKTokenizer and try to highlight the phrase that appears around 50th term in 
the field, the highlighted phrase is shorter than expected.

e.g. Highlighting "fooo" in the following text with bigram tokenizer:
"0---------1---------2---------3---------4---------fooo---"

Expected: "0---------1---------2---------3---------4---------<B>fooo</B>---"
Actual: "0---------1---------2---------3---------4---------f<B>ooo</B>---"



> Highlighter does not correctly highlight the phrase around 50th term
> --------------------------------------------------------------------
>
>                 Key: LUCENE-2910
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2910
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.9.4
>            Reporter: Shinya Kasatani
>            Priority: Trivial
>         Attachments: HighlighterFix.patch
>
>
> When you use the Highlighter combined with N-Gram tokenizers such as 
> CJKTokenizer and try to highlight the phrase that appears around 50th term in 
> the field, the highlighted phrase is shorter than expected.
> {noformat}
> e.g. Highlighting "fooo" in the following text with bigram tokenizer:
> "0---------1---------2---------3---------4---------fooo---"
> Expected: "0---------1---------2---------3---------4---------<B>fooo</B>---"
> Actual: "0---------1---------2---------3---------4---------f<B>ooo</B>---"
> {noformat}

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to