Re: SOLR Score Range Changed
On 2/23/2018 2:28 PM, Hodder, Rick wrote: > Combining everything into one query is what I'd prefer because as you said, > one would think that with everything in the same query, the score would > organize everything nicely. I don't recall writing anything like that. How did you infer that from what I wrote? One thing that you can infer from what I said is that comparing scores from multiple queries is not going to do what you think it will do. Which leads into the next thing I'll quote from your message: > So the way we had addressed it was running 3 separate SOLR queries and > combining them and sorting them by descending score - wasn’t perfect, but it > worked, and helped me to reduce the number of results we hand off to a > scoring engine that applies 3 algorithms (Monge-Elkan, Jaro-Winkler, and > SmithWindowed Affline) to further hone the results - which can take LOTS of > time if there are a lot of results, so It seems that you didn't finish your sentence, and may not have even finished the message, as this was the last thing you wrote. Running three separate queries and then trying to combine them based on score is not something you should ever attempt, because as I mentioned before, the absolute score of a document in a result is only meaningful for that specific query done at that moment. Even the same query done later after something has changed might have a very different score range. Thanks, Shawn
RE: SOLR Score Range Changed
Classic Similarity helped, but the ranges of values don’t have a min near 0 like back in 4's version Are there other attributes/elements to this factory that could get me back the old functionality? -Original Message- From: Joël Trigalo [mailto:jtrig...@gmail.com] Sent: Friday, February 23, 2018 10:41 AM To: solr-user@lucene.apache.org Subject: Re: SOLR Score Range Changed The difference seems due to the fact that default similarity in solr 7 is BM25 while it used to be TF-IDF in solr 4. As you realised, BM25 function is smoother. You can configure schema.xml to use ClassicSimilarity, for instance https://lucene.apache.org/solr/guide/6_6/major-changes-from-solr-5-to-solr-6.html#default-similarity-changes https://lucene.apache.org/solr/guide/6_6/field-type-definitions-and-properties.html#FieldTypeDefinitionsandProperties-FieldTypeSimilarity But as said before, maybe you are using properties that are not guaranteed so it would be better to change score function or sorting (rather than coming back to ClassicSimilarity)
RE: SOLR Score Range Changed
Hi Shawn, Thanks for your help - I'm still finding my way in the weeds of SOLR. Combining everything into one query is what I'd prefer because as you said, one would think that with everything in the same query, the score would organize everything nicely. >>Assuming you're using the default relevancy sort Yes >> does the order of your search results change dramatically from one version >> to the other? If it does, is the order generally better from a relevance >> standpoint, or generally worse? If you are specifying an explicit sort, >> then the scores will likely be ignored. Here's what we do - we have a list of policies with names (among other things, but I'll just use names for an example. We search for several business names to see if we have policies in common with the names so that we don’t have too much risk with them. So let's say I'm doing a search against three business names Bob's carpentry Conslidated carpentry of the Greater North West Carpentry Land q=(IDX_CompanyName:bob's AND carpentry) OR (IDX_CompanyName: conslidated AND carpentry AND of AND the AND Greater AND North AND West) OR (IDX_CompanyName: Carpentry AND Land) Searching for 750 rows has hits that are all focused on Consolidated (seemingly because the number of words causes the SOLR score to go up into a higher range for all Consolidated results, as mentioned in my previous email.) Searching for all 3 things at the same time doesn’t insure that all 3 companies will be in the results, even when run separately there are results for all 3. If I boost maxrows to 4000, I see a few bob's carpentry but most are still Consolidated So the way we had addressed it was running 3 separate SOLR queries and combining them and sorting them by descending score - wasn’t perfect, but it worked, and helped me to reduce the number of results we hand off to a scoring engine that applies 3 algorithms (Monge-Elkan, Jaro-Winkler, and SmithWindowed Affline) to further hone the results - which can take LOTS of time if there are a lot of results, so What I am describing is also why it's strongly recommended that you never try to convert scores to percentages: https://wiki.apache.org/lucene-java/ScoresAsPercentages Thanks, Shawn
Re: SOLR Score Range Changed
The difference seems due to the fact that default similarity in solr 7 is BM25 while it used to be TF-IDF in solr 4. As you realised, BM25 function is smoother. You can configure schema.xml to use ClassicSimilarity, for instance https://lucene.apache.org/solr/guide/6_6/major-changes-from-solr-5-to-solr-6.html#default-similarity-changes https://lucene.apache.org/solr/guide/6_6/field-type-definitions-and-properties.html#FieldTypeDefinitionsandProperties-FieldTypeSimilarity But as said before, maybe you are using properties that are not guaranteed so it would be better to change score function or sorting (rather than coming back to ClassicSimilarity) 2018-02-22 18:39 GMT+01:00 Shawn Heisey : > On 2/22/2018 9:50 AM, Hodder, Rick wrote: > >> I am migrating from SOLR 4.10.2 to SOLR 7.1. >> >> All seems to be going well, except for one thing: the score that is >> coming back for the resulting documents is giving different scores. >> > > The absolute score has no meaning when you change something -- the index, > the query, the software version, etc. You can't compare absolute scores. > > What matters is the relative score of one document to another *in the same > query*. The amount of difference is almost irrelevant -- the goal of > Lucene's score calculation gymnastics is to have one document score higher > than another, so the *order* is reasonably correct. > > Assuming you're using the default relevancy sort, does the order of your > search results change dramatically from one version to the other? If it > does, is the order generally better from a relevance standpoint, or > generally worse? If you are specifying an explicit sort, then the scores > will likely be ignored. > > What I am describing is also why it's strongly recommended that you never > try to convert scores to percentages: > > https://wiki.apache.org/lucene-java/ScoresAsPercentages > > Thanks, > Shawn > >
Re: SOLR Score Range Changed
On 2/22/2018 9:50 AM, Hodder, Rick wrote: I am migrating from SOLR 4.10.2 to SOLR 7.1. All seems to be going well, except for one thing: the score that is coming back for the resulting documents is giving different scores. The absolute score has no meaning when you change something -- the index, the query, the software version, etc. You can't compare absolute scores. What matters is the relative score of one document to another *in the same query*. The amount of difference is almost irrelevant -- the goal of Lucene's score calculation gymnastics is to have one document score higher than another, so the *order* is reasonably correct. Assuming you're using the default relevancy sort, does the order of your search results change dramatically from one version to the other? If it does, is the order generally better from a relevance standpoint, or generally worse? If you are specifying an explicit sort, then the scores will likely be ignored. What I am describing is also why it's strongly recommended that you never try to convert scores to percentages: https://wiki.apache.org/lucene-java/ScoresAsPercentages Thanks, Shawn