[jira] [Commented] (LUCENE-7603) Support Graph Token Streams in QueryBuilder

Michael McCandless (JIRA) Wed, 28 Dec 2016 02:37:48 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782609#comment-15782609
 ]


Michael McCandless commented on LUCENE-7603:
--------------------------------------------

This change looks great; I think it's ready!  The new 
{{TestGraphTokenStreamFiniteStrings}} is just missing the copyright header; 
I'll fix that before pushing.

The gist of the change is when query parsing detects that the analyzer produced 
a graph (any token with {{PositionLengthAttribute}} > 1), e.g. because 
{{SynonymGraphFilter}} matched or inserted a multi-token synonym, then it 
creates a {{GraphQuery}} which just a wrapper around sub-queries that traverse 
each path of the graph.

At search time, this query is currently rewritten to {{BooleanQuery}} with one 
clause for each path, but that is maybe something we can improve in the future, 
e.g. if it's a phrase query we could use {{TermAutomatonQuery}} ... but we 
should tackle that separately.

At long last, this (along with using {{SynonymGraphFilter}} at search time) 
finally fixes the long-standing bugs around multi-token synonyms, e.g. 
LUCENE-4499, LUCENE-1622, 
https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter
 ...

This will also be useful for other tokenizers/token filters as well, e.g. I'm 
working on having {{WordDelimiterFilter}} set position length correctly and 
Kuromoji ({{JapaneseTokenizer}}) already produces graph tokens.

> Support Graph Token Streams in QueryBuilder
> -------------------------------------------
>
>                 Key: LUCENE-7603
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7603
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/queryparser, core/search
>            Reporter: Matt Weber
>
> With [LUCENE-6664|https://issues.apache.org/jira/browse/LUCENE-6664] we can 
> use multi-term synonyms query time.  A "graph token stream" will be created 
> which which is nothing more than using the position length attribute on 
> stacked tokens to indicate how many positions a token should span.  Currently 
> the position length attribute on tokens is ignored during query parsing.  
> This issue will add support for handling these graph token streams inside the 
> QueryBuilder utility class used by query parsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7603) Support Graph Token Streams in QueryBuilder

Reply via email to