[jira] [Commented] (SOLR-12238) Synonym Query Style Boost By Payload

Alessandro Benedetti (JIRA) Wed, 25 Apr 2018 16:46:41 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-12238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453250#comment-16453250
 ]


Alessandro Benedetti commented on SOLR-12238:
---------------------------------------------

[~ehatcher] thanks !

1) at the moment this implementation is entirely query time. When parsing 
(building) the query the payload for the synonym is used to build a boosted 
synonym query.

Given that, an index time approach can be interesting and could work.
Why you mentioned the delimiters could cause issues?
The payload weight should be in the output synonyms, so they should not cause 
any matching problem.

Applying this at indexing time should imply a modification in the similarity 
function including a variant payload sensible ( such as PayloadBM25Similarity ).
Taking a look to org.apache.lucene.search.similarities.BM25Similarity I noticed 
this :

/** The default implementation returns <code>1</code> */
protected float scorePayload(int doc, int start, int end, BytesRef payload) {
 return 1;
}

So maybe someone already started something in that direction.
I will investigate and possibly open another Jira to track the Index Time 
implementations.

2) I haven't tried, but the ManagedSynonymGraphFilter should build the same 
graph than the not managed version, so I assume it should work fine ( I will 
experiment and double check)

> Synonym Query Style Boost By Payload
> ------------------------------------
>
>                 Key: SOLR-12238
>                 URL: https://issues.apache.org/jira/browse/SOLR-12238
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: query parsers
>            Reporter: Alessandro Benedetti
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> This improvement is built on top of the Synonym Query Style feature and 
> brings the possibility of boosting synonym queries using the payload 
> associated.
> It introduces two new modalities for the Synonym Query Style :
> PICK_BEST_BOOST_BY_PAYLOAD -> build a Disjunction query with the clauses 
> boosted by payload
> AS_DISTINCT_TERMS_BOOST_BY_PAYLOAD -> build a Boolean query with the clauses 
> boosted by payload
> This new synonym query styles will assume payloads are available so they must 
> be used in conjunction with a token filter able to produce payloads.
> An synonym.txt example could be :
> # Synonyms used by Payload Boost
> tiger => tiger|1.0, Big_Cat|0.8, Shere_Khan|0.9
> leopard => leopard, Big_Cat|0.8, Bagheera|0.9
> lion => lion|1.0, panthera leo|0.99, Simba|0.8
> snow_leopard => panthera uncia|0.99, snow leopard|1.0
> A simple token filter to populate the payloads from such synonym.txt is :
> <filter class="solr.DelimitedPayloadTokenFilterFactory" encoder="float" 
> delimiter="|"/>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12238) Synonym Query Style Boost By Payload

Reply via email to