[ https://issues.apache.org/jira/browse/LUCENE-8132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334065#comment-16334065 ]
Adrien Grand commented on LUCENE-8132: -------------------------------------- I'm not sure how practical this would be: some tokenizers today sometimes set the pos inc to 0 I think (JapanesTokenizer?) and it would only allow one of such filters in the analysis chain. > HyphenationDecompoundTokenFilter does not set position/offset attributes > correctly > ---------------------------------------------------------------------------------- > > Key: LUCENE-8132 > URL: https://issues.apache.org/jira/browse/LUCENE-8132 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis > Affects Versions: 6.6.1, 7.2.1 > Reporter: Holger Bruch > Priority: Major > > HyphenationDecompoundTokenFilter and DictionaryDecompoundTokenFilter set > positionIncrement to 0 for all subwords, reuse start/endoffset of the > original token and ignore positionLength completly. > In consequence, the QueryBuilder generates a SynonymQuery comprising all > subwords, which should rather treated as individual terms. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org