[ https://issues.apache.org/jira/browse/LUCENE-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824358#comment-15824358 ]
Jim Ferenczi commented on LUCENE-7638: -------------------------------------- [~mattweber] I don't think we lose minimum should match support. It will be different but interestingly it would also solve some problems. For instance with the all path solution, a synonym like "ny, new york" with a minimum should match of 1, searching for "ny" would not return documents matching "new york". With the proposed solution each multi-term synonyms is considered as a single clause so "ny" and "new york" count for 1. I like the finite strings solution because expressing the minimum should match in percentage gives you correct hits. This is great though it requires to duplicate a lot of terms so I wonder if this is something that we should really target. By considering each multi-term synonyms as 1 clause we could simplify the problem and produce more optimized query ? > Optimize graph query produced by QueryBuilder > --------------------------------------------- > > Key: LUCENE-7638 > URL: https://issues.apache.org/jira/browse/LUCENE-7638 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Jim Ferenczi > Attachments: LUCENE-7638.patch > > > The QueryBuilder creates a graph query when the underlying TokenStream > contains token with PositionLengthAttribute greater than 1. > These TokenStreams are in fact graphs (lattice to be more precise) where > synonyms can span on multiple terms. > Currently the graph query is built by visiting all the path of the graph > TokenStream. For instance if you have a synonym like "ny, new york" and you > search for "new york city", the query builder would produce two pathes: > "new york city", "ny city" > This can quickly explode when the number of multi terms synonyms increase. > The query "ny ny" for instance would produce 4 pathes and so on. > For boolean queries with should or must clauses it should be more efficient > to build a boolean query that merges all the intersections in the graph. So > instead of "new york city", "ny city" we could produce: > "+((+new +york) ny) +city" > The attached patch is a proposal to do that instead of the all path solution. > The patch transforms multi terms synonyms in graph query for each > intersection in the graph. This is not done in this patch but we could also > create a specialized query that gives equivalent scores to multi terms > synonyms like the SynonymQuery does for single term synonyms. > For phrase query this patch does not change the current behavior but we could > also use the new method to create optimized graph SpanQuery. > [~mattweber] I think this patch could optimize a lot of cases where multiple > muli-terms synonyms are present in a single request. Could you take a look ? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org