[ https://issues.apache.org/jira/browse/LUCENE-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782609#comment-15782609 ]
Michael McCandless commented on LUCENE-7603: -------------------------------------------- This change looks great; I think it's ready! The new {{TestGraphTokenStreamFiniteStrings}} is just missing the copyright header; I'll fix that before pushing. The gist of the change is when query parsing detects that the analyzer produced a graph (any token with {{PositionLengthAttribute}} > 1), e.g. because {{SynonymGraphFilter}} matched or inserted a multi-token synonym, then it creates a {{GraphQuery}} which just a wrapper around sub-queries that traverse each path of the graph. At search time, this query is currently rewritten to {{BooleanQuery}} with one clause for each path, but that is maybe something we can improve in the future, e.g. if it's a phrase query we could use {{TermAutomatonQuery}} ... but we should tackle that separately. At long last, this (along with using {{SynonymGraphFilter}} at search time) finally fixes the long-standing bugs around multi-token synonyms, e.g. LUCENE-4499, LUCENE-1622, https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter ... This will also be useful for other tokenizers/token filters as well, e.g. I'm working on having {{WordDelimiterFilter}} set position length correctly and Kuromoji ({{JapaneseTokenizer}}) already produces graph tokens. > Support Graph Token Streams in QueryBuilder > ------------------------------------------- > > Key: LUCENE-7603 > URL: https://issues.apache.org/jira/browse/LUCENE-7603 > Project: Lucene - Core > Issue Type: Improvement > Components: core/queryparser, core/search > Reporter: Matt Weber > > With [LUCENE-6664|https://issues.apache.org/jira/browse/LUCENE-6664] we can > use multi-term synonyms query time. A "graph token stream" will be created > which which is nothing more than using the position length attribute on > stacked tokens to indicate how many positions a token should span. Currently > the position length attribute on tokens is ignored during query parsing. > This issue will add support for handling these graph token streams inside the > QueryBuilder utility class used by query parsers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org