Doug Turnbull created SOLR-11662:
------------------------------------
Summary: More than SynonymQuery: Let overlapping query terms model
hypernym/hyponym relationships
Key: SOLR-11662
URL: https://issues.apache.org/jira/browse/SOLR-11662
Project: Solr
Issue Type: Improvement
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Doug Turnbull
Fix For: 7.2, master (8.0)
This patch customizes the query-time behavior when query terms overlap
positions. Right now the only option is SynonymQuery. This is a fantastic
default & improvement on past versions. However, there are use cases where
terms overlap positions but don't carry exact synonymy relationships. Often
synonyms are actually used to model hypernym/hyponym relationships using
synonyms (or other analyzers). So the individual term scores matter, with terms
with higher specificity (hyponym) scoring higher than terms with lower
specificity (hypernym).
This patch adds the fieldType setting scoreOverlaps, as in:
{code:java}
<fieldType name="text_general" scoreOverlaps="pick_best"
class="solr.TextField" positionIncrementGap="100" multiValued="true">
{code}
Valid values for scoreOverlaps are:
*as_one_term*
Default, most synonym use cases. Uses SynonymQuery
Treats all terms as if they're exactly equivalent, with document frequency from
underlying terms blended
*pick_best*
For a given document, score using the best scoring synonym (ie dismax over
generated terms).
Useful when synonyms not exactly equilevant. Instead they are used to model
hypernym/hyponym relationships. Such as expanding to synonyms of where terms
scores will reflect that quality
IE this query time expansion
tabby => tabby, cat, animal
Searching "text", generates the dismax (text:tabby | text:cat | text:animal)
*as_distinct_terms
*(The pre 6.0 behavior.)
Compromise between pick_best and as_oneSterm
Appropriate when synonyms reflect a hypernym/hyponym relationship, but lets
scores stack, so documents with more tabby, cat, or animal the better w/ a bias
towards the term with highest specificity
Terms are turned into a boolean OR query, with documen frequencies not blended
IE this query time expansion
tabby => tabby, cat, animal
Searching "text", generates the dismax (text:tabby text:cat text:animal)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]