[jira] [Comment Edited] (SOLR-12238) Synonym Query Style Boost By Payload

Alessandro Benedetti (JIRA) Fri, 27 Apr 2018 08:48:20 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-12238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453250#comment-16453250
 ]


Alessandro Benedetti edited comment on SOLR-12238 at 4/27/18 3:47 PM:
----------------------------------------------------------------------

[~ehatcher] thanks !

1) at the moment this implementation is entirely query time. When parsing 
(building) the query the payload for the synonym is used to build a boosted 
synonym query.

Given that, an index time approach can be interesting and could work.
 Why you mentioned the delimiters could cause issues?
 The payload weight should be in the output synonyms, so they should not cause 
any matching problem.

Applying this at indexing time should imply a modification in the similarity 
function including a variant payload sensible ( such as PayloadBM25Similarity ).
 Taking a look to org.apache.lucene.search.similarities.BM25Similarity I 
noticed this :

/** The default implementation returns <code>1</code> */
 protected float scorePayload(int doc, int start, int end, BytesRef payload)

{ return 1; }

So maybe someone already started something in that direction.
 I will investigate and possibly open another Jira to track the Index Time 
implementations.

2) I have just tried it and it is OK.
I first added the weighted synonyms (with separator) with a REST PUT and 
verified the managed synonym map was correct ( I just pushed the additional 
test in the Github Pull Request).

Then I just double checked ManagedSynonymGraphFilter builds the same graph than 
the not managed version and I was able to see a proper behaviour in extracting 
the payload


was (Author: alessandro.benedetti):
[~ehatcher] thanks !

1) at the moment this implementation is entirely query time. When parsing 
(building) the query the payload for the synonym is used to build a boosted 
synonym query.

Given that, an index time approach can be interesting and could work.
Why you mentioned the delimiters could cause issues?
The payload weight should be in the output synonyms, so they should not cause 
any matching problem.

Applying this at indexing time should imply a modification in the similarity 
function including a variant payload sensible ( such as PayloadBM25Similarity ).
Taking a look to org.apache.lucene.search.similarities.BM25Similarity I noticed 
this :

/** The default implementation returns <code>1</code> */
protected float scorePayload(int doc, int start, int end, BytesRef payload) {
 return 1;
}

So maybe someone already started something in that direction.
I will investigate and possibly open another Jira to track the Index Time 
implementations.

2) I haven't tried, but the ManagedSynonymGraphFilter should build the same 
graph than the not managed version, so I assume it should work fine ( I will 
experiment and double check)

> Synonym Query Style Boost By Payload
> ------------------------------------
>
>                 Key: SOLR-12238
>                 URL: https://issues.apache.org/jira/browse/SOLR-12238
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: query parsers
>            Reporter: Alessandro Benedetti
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> This improvement is built on top of the Synonym Query Style feature and 
> brings the possibility of boosting synonym queries using the payload 
> associated.
> It introduces two new modalities for the Synonym Query Style :
> PICK_BEST_BOOST_BY_PAYLOAD -> build a Disjunction query with the clauses 
> boosted by payload
> AS_DISTINCT_TERMS_BOOST_BY_PAYLOAD -> build a Boolean query with the clauses 
> boosted by payload
> This new synonym query styles will assume payloads are available so they must 
> be used in conjunction with a token filter able to produce payloads.
> An synonym.txt example could be :
> # Synonyms used by Payload Boost
> tiger => tiger|1.0, Big_Cat|0.8, Shere_Khan|0.9
> leopard => leopard, Big_Cat|0.8, Bagheera|0.9
> lion => lion|1.0, panthera leo|0.99, Simba|0.8
> snow_leopard => panthera uncia|0.99, snow leopard|1.0
> A simple token filter to populate the payloads from such synonym.txt is :
> <filter class="solr.DelimitedPayloadTokenFilterFactory" encoder="float" 
> delimiter="|"/>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SOLR-12238) Synonym Query Style Boost By Payload

Reply via email to