[ 
https://issues.apache.org/jira/browse/SOLR-16652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rudi Seitz updated SOLR-16652:
------------------------------
    Description: 
The presence of a multi-term synonym equivalence rule applied at query time 
prevents matching on individual terms in the synonym.

If we issue an edismax query against a text_general field in Solr 9.1, and the 
query string is "foo bar," we can match documents that have "foo" without "bar" 
and vice versa. However, if there is a synonym rule like "foo bar,baz" applied 
at query time, we no longer get single-term matches against "foor" or "bar." 
Both terms are now required, but can occur in any position: a document can 
match the query if it contains "foo bar" or "bar foo" or "bar qux foo", for 
example, but not if it only contains "foo".

However, if we change the text_general analysis chain to apply synonyms at 
index time, the observed behavior also changes and single-term matches for 
"foo" or "bar" are again possible.

Why is this an issue? 1) it is counterintuitive that a synonym equivalence (as 
opposed to a unidirectional mapping) would give narrower recall than without 
the rule, 2) this behavior represents a discrepancy in semantics between 
index-time and query-time synonym expansion.

 

*STEPS TO REPRODUCE*

Use the _default configset with "foo bar,baz" added to synonyms.txt. Index 
these four docs:

 

{"id":"1", "title_txt":"foo"}

 

{"id":"2", "title_txt":"bar"}

 

{"id":"3", "title_txt":"foo bar"}

 

{"id":"4", "title_txt":"bar foo"}

 

 
Issue a query for "foo bar" (i.e. defType=edismax&q.op=OR&qf=title_txt&q=foo 
bar)
Result: Only docs 3 and 4 come back
 
Issue a query for "bar foo"
Result: All four docs come back; the synonym rule is not invoked
 

*OBSERVATIONS*

Note that we could change the synonym rule to "foo bar,baz,foo,bar" but this 
would mean that a query for "foo" could now match a document containing only 
"bar", which is not the intent of the original rule.

Note that we could set sow=true but this would prevent the multi-term synonym 
from taking effect: the "foo bar" query could now get single-term matches on 
"foo" or "bar" but couldn't get a match on the synonym "baz"
 
Returning to the original "foo bar,baz" synonym rule with sow=false, if we look 
at the explain output for the "foo bar" query we see:

{{+((title_txt:baz (+title_txt:foo +title_txt:bar)))}}
 
Looking at the explain output for "bar foo" we see:

{{+((title_txt:bar) (title_txt:foo))}}
 
So, the observed behavior makes sense according to the low-level query 
structure, but is still counterintuitive for the reasons described above.
 
Why not expand the "foo bar" query like this instead?
 
{{+((title_txt:baz (title_txt:foo title_txt:bar)))}}
 

 

 

  was:
The presence of a multi-term synonym equivalence rule applied at query time 
prevents matching on individual terms in the synonym.

If we issue an edismax query against a text_general field in Solr 9.1, and the 
query string is "foo bar," we can match documents that have "foo" without "bar" 
and vice versa. However, if there is a synonym rule like "foo bar,baz" applied 
at query time, we no longer get single-term matches against "foor" or "bar." 
Both terms are now required, but can occur in any position: a document can 
match the query if it contains "foo bar" or "bar foo" or "bar qux foo", for 
example, but not if it only contains "foo".

However, if we change the text_general analysis chain to apply synonyms at 
index time, the observed behavior also changes and single-term matches for 
"foo" or "bar" are again possible.

Why is this an issue? 1) it is counterintuitive that a synonym equivalence (as 
opposed to a unidirectional mapping) would give narrower recall than without 
the rule, 2) this behavior represents a discrepancy in semantics between 
index-time and query-time synonym expansion.

 

*STEPS TO REPRODUCE*

Use the _default configset with "foo bar,baz" added to synonyms.txt. Index 
these four docs:

{{{"id":"1", "title_txt":"foo"} }}

{{{"id":"2", "title_txt":"bar"} }}

{{{"id":"3", "title_txt":"foo bar"} }}

{{{"id":"4", "title_txt":"bar foo"}}}

 
Issue a query for "foo bar" (i.e. defType=edismax&q.op=OR&qf=title_txt&q=foo 
bar)
Result: Only docs 3 and 4 come back
 
Issue a query for "bar foo"
Result: All four docs come back; the synonym rule is not invoked
 

*OBSERVATIONS*

Note that we could change the synonym rule to "foo bar,baz,foo,bar" but this 
would mean that a query for "foo" could now match a document containing only 
"bar", which is not the intent of the original rule.

Note that we could set sow=true but this would prevent the multi-term synonym 
from taking effect: the "foo bar" query could now get single-term matches on 
"foo" or "bar" but couldn't get a match on the synonym "baz"
 
Returning to the original "foo bar,baz" synonym rule with sow=false, if we look 
at the explain output for the "foo bar" query we see:

{{+((title_txt:baz (+title_txt:foo +title_txt:bar)))}}
 
Looking at the explain output for "bar foo" we see:

{{+((title_txt:bar) (title_txt:foo))}}
 
So, the observed behavior makes sense according to the low-level query 
structure, but is still counterintuitive for the reasons described above.
 
Why not expand the "foo bar" query like this instead?
 
{{+((title_txt:baz (title_txt:foo title_txt:bar)))}}
 

 

 


> multi-term synonym rule applied at query time prevents single-term matching
> ---------------------------------------------------------------------------
>
>                 Key: SOLR-16652
>                 URL: https://issues.apache.org/jira/browse/SOLR-16652
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: query parsers
>    Affects Versions: 9.1
>            Reporter: Rudi Seitz
>            Priority: Major
>
> The presence of a multi-term synonym equivalence rule applied at query time 
> prevents matching on individual terms in the synonym.
> If we issue an edismax query against a text_general field in Solr 9.1, and 
> the query string is "foo bar," we can match documents that have "foo" without 
> "bar" and vice versa. However, if there is a synonym rule like "foo bar,baz" 
> applied at query time, we no longer get single-term matches against "foor" or 
> "bar." Both terms are now required, but can occur in any position: a document 
> can match the query if it contains "foo bar" or "bar foo" or "bar qux foo", 
> for example, but not if it only contains "foo".
> However, if we change the text_general analysis chain to apply synonyms at 
> index time, the observed behavior also changes and single-term matches for 
> "foo" or "bar" are again possible.
> Why is this an issue? 1) it is counterintuitive that a synonym equivalence 
> (as opposed to a unidirectional mapping) would give narrower recall than 
> without the rule, 2) this behavior represents a discrepancy in semantics 
> between index-time and query-time synonym expansion.
>  
> *STEPS TO REPRODUCE*
> Use the _default configset with "foo bar,baz" added to synonyms.txt. Index 
> these four docs:
>  
> {"id":"1", "title_txt":"foo"}
>  
> {"id":"2", "title_txt":"bar"}
>  
> {"id":"3", "title_txt":"foo bar"}
>  
> {"id":"4", "title_txt":"bar foo"}
>  
>  
> Issue a query for "foo bar" (i.e. defType=edismax&q.op=OR&qf=title_txt&q=foo 
> bar)
> Result: Only docs 3 and 4 come back
>  
> Issue a query for "bar foo"
> Result: All four docs come back; the synonym rule is not invoked
>  
> *OBSERVATIONS*
> Note that we could change the synonym rule to "foo bar,baz,foo,bar" but this 
> would mean that a query for "foo" could now match a document containing only 
> "bar", which is not the intent of the original rule.
> Note that we could set sow=true but this would prevent the multi-term synonym 
> from taking effect: the "foo bar" query could now get single-term matches on 
> "foo" or "bar" but couldn't get a match on the synonym "baz"
>  
> Returning to the original "foo bar,baz" synonym rule with sow=false, if we 
> look at the explain output for the "foo bar" query we see:
> {{+((title_txt:baz (+title_txt:foo +title_txt:bar)))}}
>  
> Looking at the explain output for "bar foo" we see:
> {{+((title_txt:bar) (title_txt:foo))}}
>  
> So, the observed behavior makes sense according to the low-level query 
> structure, but is still counterintuitive for the reasons described above.
>  
> Why not expand the "foo bar" query like this instead?
>  
> {{+((title_txt:baz (title_txt:foo title_txt:bar)))}}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to