[
https://issues.apache.org/jira/browse/LUCENE-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782609#comment-15782609
]
Michael McCandless commented on LUCENE-7603:
--------------------------------------------
This change looks great; I think it's ready! The new
{{TestGraphTokenStreamFiniteStrings}} is just missing the copyright header;
I'll fix that before pushing.
The gist of the change is when query parsing detects that the analyzer produced
a graph (any token with {{PositionLengthAttribute}} > 1), e.g. because
{{SynonymGraphFilter}} matched or inserted a multi-token synonym, then it
creates a {{GraphQuery}} which just a wrapper around sub-queries that traverse
each path of the graph.
At search time, this query is currently rewritten to {{BooleanQuery}} with one
clause for each path, but that is maybe something we can improve in the future,
e.g. if it's a phrase query we could use {{TermAutomatonQuery}} ... but we
should tackle that separately.
At long last, this (along with using {{SynonymGraphFilter}} at search time)
finally fixes the long-standing bugs around multi-token synonyms, e.g.
LUCENE-4499, LUCENE-1622,
https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter
...
This will also be useful for other tokenizers/token filters as well, e.g. I'm
working on having {{WordDelimiterFilter}} set position length correctly and
Kuromoji ({{JapaneseTokenizer}}) already produces graph tokens.
> Support Graph Token Streams in QueryBuilder
> -------------------------------------------
>
> Key: LUCENE-7603
> URL: https://issues.apache.org/jira/browse/LUCENE-7603
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/queryparser, core/search
> Reporter: Matt Weber
>
> With [LUCENE-6664|https://issues.apache.org/jira/browse/LUCENE-6664] we can
> use multi-term synonyms query time. A "graph token stream" will be created
> which which is nothing more than using the position length attribute on
> stacked tokens to indicate how many positions a token should span. Currently
> the position length attribute on tokens is ignored during query parsing.
> This issue will add support for handling these graph token streams inside the
> QueryBuilder utility class used by query parsers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]