[jira] [Commented] (SOLR-16652) multi-term synonym rule applied at query time prevents single-term matching
[ https://issues.apache.org/jira/browse/SOLR-16652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687998#comment-17687998 ] Mikhail Khludnev commented on SOLR-16652: - it seems by design https://github.com/apache/lucene/blob/main/lucene/queryparser/src/test/org/apache/lucene/queryparser/classic/TestQueryParser.java#L591 It sets mw synonym: {{"guinea pig => cavy"}} {{dumb.parse("guinea pig") => ((+field:guinea +field:pig) field:cavy)}} Doesn't match just 'guinea' as expected in this ticket. > multi-term synonym rule applied at query time prevents single-term matching > --- > > Key: SOLR-16652 > URL: https://issues.apache.org/jira/browse/SOLR-16652 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: 9.1 >Reporter: Rudi Seitz >Priority: Major > > The presence of a multi-term synonym equivalence rule applied at query time > prevents matching on individual terms in the synonym. > If we issue an edismax query against a text_general field in Solr 9.1, and > the query string is "foo bar," we can match documents that have "foo" without > "bar" and vice versa. However, if there is a synonym rule like "foo bar,baz" > applied at query time, we no longer get single-term matches against "foo" or > "bar." Both terms are now required, but can occur in any position: a document > can match the query if it contains "foo bar" or "bar foo" or "bar qux foo", > for example, but not if it only contains "foo". > However, if we change the text_general analysis chain to apply synonyms at > index time, the observed behavior also changes and single-term matches for > "foo" or "bar" are again possible. > Why is this an issue? 1) it is counterintuitive that a synonym equivalence > (as opposed to a unidirectional mapping) would give narrower recall than > without the rule, 2) this behavior represents a discrepancy in semantics > between index-time and query-time synonym expansion. > > *STEPS TO REPRODUCE* > Use the _default configset with "foo bar,baz" added to synonyms.txt. Index > these four docs: > > {"id":"1", "title_txt":"foo"} > > {"id":"2", "title_txt":"bar"} > > {"id":"3", "title_txt":"foo bar"} > > {"id":"4", "title_txt":"bar foo"} > > > Issue a query for "foo bar" (i.e. defType=edismax&q.op=OR&qf=title_txt&q=foo > bar) > Result: Only docs 3 and 4 come back > > Issue a query for "bar foo" > Result: All four docs come back; the synonym rule is not invoked > > *OBSERVATIONS* > Note that we could change the synonym rule to "foo bar,baz,foo,bar" but this > would mean that a query for "foo" could now match a document containing only > "bar", which is not the intent of the original rule. > Note that we could set sow=true but this would prevent the multi-term synonym > from taking effect: the "foo bar" query could now get single-term matches on > "foo" or "bar" but couldn't get a match on the synonym "baz" > > Returning to the original "foo bar,baz" synonym rule with sow=false, if we > look at the explain output for the "foo bar" query we see: > {{+((title_txt:baz (+title_txt:foo +title_txt:bar)))}} > > Looking at the explain output for "bar foo" we see: > {{+((title_txt:bar) (title_txt:foo))}} > > So, the observed behavior makes sense according to the low-level query > structure, but is still counterintuitive for the reasons described above. > > Why not expand the "foo bar" query like this instead? > > {{+((title_txt:baz (title_txt:foo title_txt:bar)))}} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-16652) multi-term synonym rule applied at query time prevents single-term matching
[ https://issues.apache.org/jira/browse/SOLR-16652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687959#comment-17687959 ] Rudi Seitz commented on SOLR-16652: --- If the original rule is "foo bar,baz" I believe Mikhail's suggestion to convert it to a directional rule would work, but we'd need two of them: foo bar=>baz,foo,bar baz=>foo bar > multi-term synonym rule applied at query time prevents single-term matching > --- > > Key: SOLR-16652 > URL: https://issues.apache.org/jira/browse/SOLR-16652 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: 9.1 >Reporter: Rudi Seitz >Priority: Major > > The presence of a multi-term synonym equivalence rule applied at query time > prevents matching on individual terms in the synonym. > If we issue an edismax query against a text_general field in Solr 9.1, and > the query string is "foo bar," we can match documents that have "foo" without > "bar" and vice versa. However, if there is a synonym rule like "foo bar,baz" > applied at query time, we no longer get single-term matches against "foo" or > "bar." Both terms are now required, but can occur in any position: a document > can match the query if it contains "foo bar" or "bar foo" or "bar qux foo", > for example, but not if it only contains "foo". > However, if we change the text_general analysis chain to apply synonyms at > index time, the observed behavior also changes and single-term matches for > "foo" or "bar" are again possible. > Why is this an issue? 1) it is counterintuitive that a synonym equivalence > (as opposed to a unidirectional mapping) would give narrower recall than > without the rule, 2) this behavior represents a discrepancy in semantics > between index-time and query-time synonym expansion. > > *STEPS TO REPRODUCE* > Use the _default configset with "foo bar,baz" added to synonyms.txt. Index > these four docs: > > {"id":"1", "title_txt":"foo"} > > {"id":"2", "title_txt":"bar"} > > {"id":"3", "title_txt":"foo bar"} > > {"id":"4", "title_txt":"bar foo"} > > > Issue a query for "foo bar" (i.e. defType=edismax&q.op=OR&qf=title_txt&q=foo > bar) > Result: Only docs 3 and 4 come back > > Issue a query for "bar foo" > Result: All four docs come back; the synonym rule is not invoked > > *OBSERVATIONS* > Note that we could change the synonym rule to "foo bar,baz,foo,bar" but this > would mean that a query for "foo" could now match a document containing only > "bar", which is not the intent of the original rule. > Note that we could set sow=true but this would prevent the multi-term synonym > from taking effect: the "foo bar" query could now get single-term matches on > "foo" or "bar" but couldn't get a match on the synonym "baz" > > Returning to the original "foo bar,baz" synonym rule with sow=false, if we > look at the explain output for the "foo bar" query we see: > {{+((title_txt:baz (+title_txt:foo +title_txt:bar)))}} > > Looking at the explain output for "bar foo" we see: > {{+((title_txt:bar) (title_txt:foo))}} > > So, the observed behavior makes sense according to the low-level query > structure, but is still counterintuitive for the reasons described above. > > Why not expand the "foo bar" query like this instead? > > {{+((title_txt:baz (title_txt:foo title_txt:bar)))}} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-16652) multi-term synonym rule applied at query time prevents single-term matching
[ https://issues.apache.org/jira/browse/SOLR-16652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687954#comment-17687954 ] Rudi Seitz commented on SOLR-16652: --- >From [~mkhl] via us...@solr.apache.org: {quote}Thanks for raising a ticket. Here are just two considerations: > we could change the synonym rule to "foo bar,baz,foo,bar" but this would mean that a query for "foo" could now match a document containing only "bar", which is not the intent of the original rule. Ok. The later issue can be probably fixed by directing synonyms foo bar=>baz,foo,bar Right, It seems like a weird band aid. I stepped through lucene code, MUST occur for synonyms is defined [https://github.com/apache/lucene/blob/7baa01b3c2f93e6b172e986aac8ef577a87ebceb/lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java#L534] Presumably, original terms could go with defaultOperator, and synonym replacement keep MUST. {quote} > multi-term synonym rule applied at query time prevents single-term matching > --- > > Key: SOLR-16652 > URL: https://issues.apache.org/jira/browse/SOLR-16652 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: 9.1 >Reporter: Rudi Seitz >Priority: Major > > The presence of a multi-term synonym equivalence rule applied at query time > prevents matching on individual terms in the synonym. > If we issue an edismax query against a text_general field in Solr 9.1, and > the query string is "foo bar," we can match documents that have "foo" without > "bar" and vice versa. However, if there is a synonym rule like "foo bar,baz" > applied at query time, we no longer get single-term matches against "foo" or > "bar." Both terms are now required, but can occur in any position: a document > can match the query if it contains "foo bar" or "bar foo" or "bar qux foo", > for example, but not if it only contains "foo". > However, if we change the text_general analysis chain to apply synonyms at > index time, the observed behavior also changes and single-term matches for > "foo" or "bar" are again possible. > Why is this an issue? 1) it is counterintuitive that a synonym equivalence > (as opposed to a unidirectional mapping) would give narrower recall than > without the rule, 2) this behavior represents a discrepancy in semantics > between index-time and query-time synonym expansion. > > *STEPS TO REPRODUCE* > Use the _default configset with "foo bar,baz" added to synonyms.txt. Index > these four docs: > > {"id":"1", "title_txt":"foo"} > > {"id":"2", "title_txt":"bar"} > > {"id":"3", "title_txt":"foo bar"} > > {"id":"4", "title_txt":"bar foo"} > > > Issue a query for "foo bar" (i.e. defType=edismax&q.op=OR&qf=title_txt&q=foo > bar) > Result: Only docs 3 and 4 come back > > Issue a query for "bar foo" > Result: All four docs come back; the synonym rule is not invoked > > *OBSERVATIONS* > Note that we could change the synonym rule to "foo bar,baz,foo,bar" but this > would mean that a query for "foo" could now match a document containing only > "bar", which is not the intent of the original rule. > Note that we could set sow=true but this would prevent the multi-term synonym > from taking effect: the "foo bar" query could now get single-term matches on > "foo" or "bar" but couldn't get a match on the synonym "baz" > > Returning to the original "foo bar,baz" synonym rule with sow=false, if we > look at the explain output for the "foo bar" query we see: > {{+((title_txt:baz (+title_txt:foo +title_txt:bar)))}} > > Looking at the explain output for "bar foo" we see: > {{+((title_txt:bar) (title_txt:foo))}} > > So, the observed behavior makes sense according to the low-level query > structure, but is still counterintuitive for the reasons described above. > > Why not expand the "foo bar" query like this instead? > > {{+((title_txt:baz (title_txt:foo title_txt:bar)))}} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org