[jira] [Comment Edited] (SOLR-12243) Edismax missing phrase queries when phrases contain multiterm synonyms

Uwe Schindler (JIRA) Fri, 19 Oct 2018 09:52:10 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-12243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657074#comment-16657074
 ]


Uwe Schindler edited comment on SOLR-12243 at 10/19/18 4:51 PM:
----------------------------------------------------------------

That's waht I mean, it's still linked together. The main bug is still in 
Lucene, because the Lucene Query builder creates a query that does not 
correctly implement span queries on multi-term synonyms, because it uses the 
wrong query type. The issues here are coming from the fact that dismax relies 
on the interal implementation of the lucene code, which is not a good thing. 
The solr code should not do this and instead we should add something into 
Lucene that can create those pf auto-phrase queries. I was missing that in an 
own query parser, too. So basically it would be good to have some additional 
query builder method in Lucene that analyzes some text and then builds 
configureable shingles that are connected with span/phrase using a slop. This 
code should not depend on the structure of a span/boolean query that was parsed 
before.

I'd like to wait a few days until the Lucene issue is solved and then review 
the changes here and adapt them as necessary. On the longer term, I'd like to 
get rid of the query instanceof spaghetticode and move the query construction 
for dismax-like queries using term shingles (bigrams, trigrams) to a separate 
builder class. So it's better resuseable.


was (Author: thetaphi):
That's waht I mean, it's still linked together. The main bug is still in 
Lucene, because the Lucene Query builder creates a query that does not 
correctly implement span queries on multi-term synonyms, because it uses the 
wrong query type. The issues here are coming from the fact that dismax relies 
on the interal implementation of the lucene code, which is not a good thing. 
The solr code should not do this and instead we should add something into 
Lucene that can create those pf auto-phrase queries. I was missing that in an 
own query parser, too. So basically it would be good to have some additional 
query builder method in Lucene that analyzes some text and then builds 
configureable shingles that are connected with span/phrase using a slop. This 
code should not depend on the structure of a span/boolean query that was parsed 
before.

I'd like to wait a few days until the Lucene issue is solved and then review 
the changes here and adapt them as necessary. On the longer term, I'd like to 
get rid of the query instance of shingling and move the query construction for 
dismax-like queries to a separate builder class. So it's better resuseable.

> Edismax missing phrase queries when phrases contain multiterm synonyms
> ----------------------------------------------------------------------
>
>                 Key: SOLR-12243
>                 URL: https://issues.apache.org/jira/browse/SOLR-12243
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: query parsers
>    Affects Versions: 7.1
>         Environment: RHEL, MacOS X
> Do not believe this is environment-specific.
>            Reporter: Elizabeth Haubert
>            Assignee: Uwe Schindler
>            Priority: Major
>         Attachments: SOLR-12243.patch, SOLR-12243.patch, SOLR-12243.patch, 
> SOLR-12243.patch, SOLR-12243.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> synonyms.txt:
> {code}
> allergic, hypersensitive
> aspirin, acetylsalicylic acid
> dog, canine, canis familiris, k 9
> rat, rattus
> {code}
> request handler:
> {code:xml}
> <requestHandler name="/test_qparse_error" class="solr.SearchHandler">
>  <lst name="defaults">
> <!-- Query settings -->
>  <str name="defType">edismax</str>
>  <str name="tie"> 0.4</str>
>  <str name="qf">title^100</str>
>  <str name="pf">title~20^5000</str>
>  <str name="pf2">title~11</str>
>  <str name="pf3">title~22^1000</str>
>  <str name="df">text</str>
>  <!-- mm If two or fewer clauses exist, they all must match. 
>  If three to five clauses exist, one can be missing. If six to eight clauses 
> exist, all but three must match. 
>  If more than nine clauses exist, only require 30% to match.-->
>  <str name="mm">3&lt;-1 6&lt;-3 9&lt;30%</str>
>  <str name="q.alt">*:*</str>
>  <str name="rows">25</str>
> </lst>
> </requestHandler>
> {code}
> Phrase queries (pf, pf2, pf3) containing "dog" or "aspirin"  against the 
> above list will not be generated.
> "allergic reaction dog" will generate pf2: "allergic reaction", but not 
> pf:"allergic reaction dog", pf2: "reaction dog", or pf3: "allergic reaction 
> dog"
> "aspirin dose in rats" will generate pf3: "dose ? rats" but not pf2: "aspirin 
> dose" or pf3:"aspirin dose ?"
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-12243) Edismax missing phrase queries when phrases contain multiterm synonyms

Reply via email to