[ https://issues.apache.org/jira/browse/SOLR-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jan Høydahl updated SOLR-13126: ------------------------------- Fix Version/s: 7.7.2 > Multiplicative boost of isn't applied when one of the summed or multiplied > queries doesn't match > ------------------------------------------------------------------------------------------------- > > Key: SOLR-13126 > URL: https://issues.apache.org/jira/browse/SOLR-13126 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search > Affects Versions: 7.3, 7.4, 7.6, 7.7, 7.5.0, 7.7.1 > Environment: Reproduced with macOS 10.14.1, a quick test with Windows > 10 showed the same result. > Reporter: Thomas Aglassinger > Assignee: Alan Woodward > Priority: Major > Fix For: 7.7.2, 8.0 > > Attachments: > 0001-use-deprecated-classes-to-fix-regression-introduced-.patch, > 0002-SOLR-13126-Added-test-case.patch, 2019-02-14_1715.png, SOLR-13126.patch, > SOLR-13126.patch, debugQuery.json, image-2019-02-13-16-17-56-272.png, > screenshot-1.png, solr_match_neither_nextteil_nor_sony.json, > solr_match_neither_nextteil_nor_sony.txt, solr_match_netzteil_and_sony.json, > solr_match_netzteil_and_sony.txt, solr_match_netzteil_only.json, > solr_match_netzteil_only.txt > > > Under certain circumstances search results from queries with multiple > multiplicative boosts using the Solr functions {{product()}} and {{query()}} > result in a score that is inconsistent with the one from the debugQuery > information. Also only the debug score is correct while the actual search > results show a wrong score. > This seems somewhat similar to the behaviour described in > https://issues.apache.org/jira/browse/LUCENE-7132, though this issue has been > resolved a while ago. > A little background: we are using Solr as a search platform for the > e-commerce framework SAP Hybris. There the shop administrator can create > multiplicative boost rules (see below for an example) where a value like 2.0 > means that an item gets boosted to 200%. This works fine in the demo shop > distributed by SAP but breaks in our shop. We encountered the issue when > Upgrading from Solr 7.2.1 / Hybris 6.7 to Solr 7.5 / Hybris 18.8.3 (which > would have been named Hybris 6.8 but the version naming schema changed). > We reduced the Solr query generated by Hybris to the relevant parts and could > reproduce the issue in the Solr admin without any Hybris connection. > I attached the JSON result of a test query but here's a description of the > parts that seemed most relevant to me. > The {{responseHeader.params}} reads (slightly rearranged): > {code:java} > "q":"{!boost b=$ymb}(+{!lucene v=$yq})", > "ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))", > "yq":"*:*", > "sort":"score desc", > "debugQuery":"true", > // Added to keep the output small but probably unrelated to the actual issue > "fl":"score,id,code_string,name_text_de", > "fq":"catalogId:\"someProducts\"", > "rows":"10", > {code} > This example boosts the German product name (field {{name_text_de}}) in case > in contains certain terms: > * "Netzteil" (power supply) is boosted to 200% > * "Sony" is boosted to 300% > Consequently a product containing both terms should be boosted to 600%. > Also the query function has the value 1 specified as default in case the name > does not contain the respective term resulting in a pseudo boost that > preserves the score. > According to the debug information the parser used is the LuceneQParser, > which translates this to the following parsed query: > {quote}FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by > boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0))))) > {quote} > And the translated boost is: > {quote}org.apache.lucene.queries.function.valuesource.ProductFloatFunction:product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0)) > {quote} > When taking a look at the search result, among other the following products > are included (see the JSON comments for an analysis of each result): > {code:javascript} > { > "id":"someProducts/Online/test7111111", > "name_text_de":"Original Sony Vaio Netzteil", > "code_string":"test7111111", > // CORRECT, both "Netzteil" and "Sony" are included in the name > "score":6.0}, > { > "id":"someProducts/Online/taxTestingProductThree", > "name_text_de":"Steuertestprodukt Zwei", > "code_string":"taxTestingProductThree", > // CORRECT, neither "Netzteil" nor "Sony" are included in the name > "score":1.0}, > { > "id":"someProducts/Online/797856300000", > "name_text_de":"GS-Netzteil 20W schwarz", > "code_string":"797856300000", > // WRONG, "Netzteil" is part of the name; > // note that we do split words on hyphen because > // WordDelimiterGraphFilterFactory.generateWordParts="1" > "score":1.0}, > {code} > So apparently the multiplicative boost works for product names where all the > boosted terms are included but fails if only one of the terms matches. > There are also other products in the result that contain either "Netzteil" or > "Sony" but still get a score of 1.0 instead of 2.0 resp. 3.0. > Surprisingly in the {{explain}} segment the score for the product with > "Netzteil" but without "Sony" correctly is 2.0: > {code:java} > 2.0 = product of: > 1.0 = boost > 2.0 = product of: > 1.0 = *:* > 2.0 = > product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0)=2.0,query((ConstantScore(name_text_de:sony))^3.0,def=1.0)=1.0) > {code} > The type definition of {{text_de}} in the {{schema.xml}} (which is used for > "name_text_de") includes the following filters: > {code:xml} > <fieldType name="text_de" class="solr.TextField" positionIncrementGap="100"> > <analyzer> > <tokenizer class="solr.WhitespaceTokenizerFactory" /> > <filter class="solr.WordDelimiterGraphFilterFactory" > preserveOriginal="1" > generateWordParts="1" generateNumberParts="1" > catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" /> > <filter class="solr.LowerCaseFilterFactory" /> > </analyzer> > </fieldType> > {code} > The {{solrconfig.xml}} mostly is taken form the Hybris defaults and AFAIK > does not do anything kinky. The following lines might be of interest: > {code:xml} > <luceneMatchVersion>7.5.0</luceneMatchVersion> > <queryParser name="multiMaxScore" > class="de.hybris.platform.solr.search.MultiMaxScoreQParserPlugin"/> > {code} > To sum it up, my expectation would have been: > * The score in the result and explain section are identical. > * Names matching only one of the two multiplied boost terms are receive the > respective single boost instead of the default score 1.0. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org