[ https://issues.apache.org/jira/browse/LUCENE-8132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334273#comment-16334273 ]
Robert Muir commented on LUCENE-8132: ------------------------------------- Thats what HyphenationDecompoundTokenFilter already does. I think maybe the name is confusing, at least look at the class javadocs :) In this case I'm sorry but I think you are stretching, (and you aren't correct). We should fix these filters and enforce tokenizer as input, seriously. > HyphenationDecompoundTokenFilter does not set position/offset attributes > correctly > ---------------------------------------------------------------------------------- > > Key: LUCENE-8132 > URL: https://issues.apache.org/jira/browse/LUCENE-8132 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis > Affects Versions: 6.6.1, 7.2.1 > Reporter: Holger Bruch > Priority: Major > > HyphenationDecompoundTokenFilter and DictionaryDecompoundTokenFilter set > positionIncrement to 0 for all subwords, reuse start/endoffset of the > original token and ignore positionLength completly. > In consequence, the QueryBuilder generates a SynonymQuery comprising all > subwords, which should rather treated as individual terms. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org