[ 
https://issues.apache.org/jira/browse/LUCENE-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2035:
--------------------------------

    Fix Version/s: 3.1
                   3.0.3
                   2.9.4

> TokenSources.getTokenStream() does not assign positionIncrement
> ---------------------------------------------------------------
>
>                 Key: LUCENE-2035
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2035
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4, 2.4.1, 2.9
>            Reporter: Christopher Morris
>            Assignee: Mark Miller
>             Fix For: 2.9.4, 3.0.3, 3.1, 4.0
>
>         Attachments: LUCENE-2035.patch, LUCENE-2035.patch, LUCENE-2305.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> TokenSources.StoredTokenStream does not assign positionIncrement information. 
> This means that all tokens in the stream are considered adjacent. This has 
> implications for the phrase highlighting in QueryScorer when using 
> non-contiguous tokens.
> For example:
> Consider  a token stream that creates tokens for both the stemmed and 
> unstemmed version of each word - the fox (jump|jumped)
> When retrieved from the index using TokenSources.getTokenStream(tpv,false), 
> the token stream will be - the fox jump jumped
> Now try a search and highlight for the phrase query "fox jumped". The search 
> will correctly find the document; the highlighter will fail to highlight the 
> phrase because it thinks that there is an additional word between "fox" and 
> "jumped". If we use the original (from the analyzer) token stream then the 
> highlighter works.
> Also, consider the converse - the fox did not jump
> "not" is a stop word and there is an option to increment the position to 
> account for stop words - (the,0) (fox,1) (did,2) (jump,4)
> When retrieved from the index using TokenSources.getTokenStream(tpv,false), 
> the token stream will be - (the,0) (fox,1) (did,2) (jump,3).
> So the phrase query "did jump" will cause the "did" and "jump" terms in the 
> text "did not jump" to be highlighted. If we use the original (from the 
> analyzer) token stream then the highlighter works correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to