[jira] [Commented] (LUCENE-8132) HyphenationDecompoundTokenFilter does not set position/offset attributes correctly

Robert Muir (JIRA) Mon, 22 Jan 2018 05:45:20 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-8132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334273#comment-16334273
 ]


Robert Muir commented on LUCENE-8132:
-------------------------------------

Thats what HyphenationDecompoundTokenFilter already does. I think maybe the 
name is confusing, at least look at the class javadocs :)

In this case I'm sorry but I think you are stretching, (and you aren't 
correct). We should fix these filters and enforce tokenizer as input, seriously.

> HyphenationDecompoundTokenFilter does not set position/offset attributes 
> correctly
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-8132
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8132
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/analysis
>    Affects Versions: 6.6.1, 7.2.1
>            Reporter: Holger Bruch
>            Priority: Major
>
> HyphenationDecompoundTokenFilter and DictionaryDecompoundTokenFilter set 
> positionIncrement to 0 for all subwords, reuse start/endoffset of the 
> original token and ignore positionLength completly.
> In consequence, the QueryBuilder generates a SynonymQuery comprising all 
> subwords, which should rather treated as individual terms.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-8132) HyphenationDecompoundTokenFilter does not set position/offset attributes correctly

Reply via email to