[ https://issues.apache.org/jira/browse/SOLR-12238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453250#comment-16453250 ]
Alessandro Benedetti edited comment on SOLR-12238 at 4/27/18 3:47 PM: ---------------------------------------------------------------------- [~ehatcher] thanks ! 1) at the moment this implementation is entirely query time. When parsing (building) the query the payload for the synonym is used to build a boosted synonym query. Given that, an index time approach can be interesting and could work. Why you mentioned the delimiters could cause issues? The payload weight should be in the output synonyms, so they should not cause any matching problem. Applying this at indexing time should imply a modification in the similarity function including a variant payload sensible ( such as PayloadBM25Similarity ). Taking a look to org.apache.lucene.search.similarities.BM25Similarity I noticed this : /** The default implementation returns <code>1</code> */ protected float scorePayload(int doc, int start, int end, BytesRef payload) { return 1; } So maybe someone already started something in that direction. I will investigate and possibly open another Jira to track the Index Time implementations. 2) I have just tried it and it is OK. I first added the weighted synonyms (with separator) with a REST PUT and verified the managed synonym map was correct ( I just pushed the additional test in the Github Pull Request). Then I just double checked ManagedSynonymGraphFilter builds the same graph than the not managed version and I was able to see a proper behaviour in extracting the payload was (Author: alessandro.benedetti): [~ehatcher] thanks ! 1) at the moment this implementation is entirely query time. When parsing (building) the query the payload for the synonym is used to build a boosted synonym query. Given that, an index time approach can be interesting and could work. Why you mentioned the delimiters could cause issues? The payload weight should be in the output synonyms, so they should not cause any matching problem. Applying this at indexing time should imply a modification in the similarity function including a variant payload sensible ( such as PayloadBM25Similarity ). Taking a look to org.apache.lucene.search.similarities.BM25Similarity I noticed this : /** The default implementation returns <code>1</code> */ protected float scorePayload(int doc, int start, int end, BytesRef payload) { return 1; } So maybe someone already started something in that direction. I will investigate and possibly open another Jira to track the Index Time implementations. 2) I haven't tried, but the ManagedSynonymGraphFilter should build the same graph than the not managed version, so I assume it should work fine ( I will experiment and double check) > Synonym Query Style Boost By Payload > ------------------------------------ > > Key: SOLR-12238 > URL: https://issues.apache.org/jira/browse/SOLR-12238 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers > Reporter: Alessandro Benedetti > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > This improvement is built on top of the Synonym Query Style feature and > brings the possibility of boosting synonym queries using the payload > associated. > It introduces two new modalities for the Synonym Query Style : > PICK_BEST_BOOST_BY_PAYLOAD -> build a Disjunction query with the clauses > boosted by payload > AS_DISTINCT_TERMS_BOOST_BY_PAYLOAD -> build a Boolean query with the clauses > boosted by payload > This new synonym query styles will assume payloads are available so they must > be used in conjunction with a token filter able to produce payloads. > An synonym.txt example could be : > # Synonyms used by Payload Boost > tiger => tiger|1.0, Big_Cat|0.8, Shere_Khan|0.9 > leopard => leopard, Big_Cat|0.8, Bagheera|0.9 > lion => lion|1.0, panthera leo|0.99, Simba|0.8 > snow_leopard => panthera uncia|0.99, snow leopard|1.0 > A simple token filter to populate the payloads from such synonym.txt is : > <filter class="solr.DelimitedPayloadTokenFilterFactory" encoder="float" > delimiter="|"/> -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org