[ https://issues.apache.org/jira/browse/LUCENE-8132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16335427#comment-16335427 ]
Holger Bruch commented on LUCENE-8132: -------------------------------------- I’m not as deeply in Lucene as you are. What would be the pros and cons of ensuring the input is an instance of tokenizer? Would it still be possible to apply a token filters like WDF or lowercase filter before the HyphenationDecompunder? > HyphenationDecompoundTokenFilter does not set position/offset attributes > correctly > ---------------------------------------------------------------------------------- > > Key: LUCENE-8132 > URL: https://issues.apache.org/jira/browse/LUCENE-8132 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis > Affects Versions: 6.6.1, 7.2.1 > Reporter: Holger Bruch > Priority: Major > > HyphenationDecompoundTokenFilter and DictionaryDecompoundTokenFilter set > positionIncrement to 0 for all subwords, reuse start/endoffset of the > original token and ignore positionLength completly. > In consequence, the QueryBuilder generates a SynonymQuery comprising all > subwords, which should rather treated as individual terms. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org