[ https://issues.apache.org/jira/browse/LUCENE-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16948165#comment-16948165 ]
Chongchen Chen commented on LUCENE-8137: ---------------------------------------- to solve this problem, is it a better choice to mark filter instead of really filter? > GraphTokenStreamFiniteStrings does not handle position inc > 1 in multi-word > synoyms > ------------------------------------------------------------------------------------ > > Key: LUCENE-8137 > URL: https://issues.apache.org/jira/browse/LUCENE-8137 > Project: Lucene - Core > Issue Type: Bug > Affects Versions: 7.2.1, 8.0 > Reporter: Jim Ferenczi > Assignee: Jim Ferenczi > Priority: Major > Attachments: SGF_SF_interaction.patch > > > The automaton built for graph queries that contain multiple multi-word > synonyms does not handle gaps if they appear in the middle of a multi-word > synonym. In such case the token next to the gap is considered as part of the > multi-word synonym. > Stop words that appear before or after multi-word synonyms are handled > correctly in the current version but the synonym rule "part of speech, pos" > for instance does not create the expected query if "of" is removed by a > filter that is set after the synonym_graph. One solution would be to reuse > TokenStreamToAutomaton (with minor changes to add the ability to create token > transitions rather than chars) which preserves gaps (as a transition) in the > produced automaton. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org