Re: SOLR Score Range Changed

2018-02-26 Thread Shawn Heisey
On 2/23/2018 2:28 PM, Hodder, Rick wrote:
> Combining everything into one query is what I'd prefer because as you said, 
> one would think that with everything in the same query, the score would 
> organize everything nicely.

I don't recall writing anything like that.  How did you infer that from
what I wrote?  One thing that you can infer from what I said is that
comparing scores from multiple queries is not going to do what you think
it will do.  Which leads into the next thing I'll quote from your message:

> So the way we had addressed it was running 3 separate SOLR queries and 
> combining them and sorting them by descending score - wasn’t perfect, but it 
> worked, and helped me to reduce the number of results we hand off to a 
> scoring engine that applies 3 algorithms (Monge-Elkan, Jaro-Winkler, and 
> SmithWindowed Affline) to further hone the results - which can take LOTS of 
> time if there are a lot of results, so 

It seems that you didn't finish your sentence, and may not have even
finished the message, as this was the last thing you wrote.

Running three separate queries and then trying to combine them based on
score is not something you should ever attempt, because as I mentioned
before, the absolute score of a document in a result is only meaningful
for that specific query done at that moment.  Even the same query done
later after something has changed might have a very different score range.

Thanks,
Shawn



RE: SOLR Score Range Changed

2018-02-23 Thread Hodder, Rick
Classic Similarity helped, but the ranges of values don’t have a min near 0 
like back in 4's version



Are there other attributes/elements to this factory that could get me back the 
old functionality?

-Original Message-
From: Joël Trigalo [mailto:jtrig...@gmail.com] 
Sent: Friday, February 23, 2018 10:41 AM
To: solr-user@lucene.apache.org
Subject: Re: SOLR Score Range Changed

The difference seems due to the fact that default similarity in solr 7 is
BM25 while it used to be TF-IDF in solr 4. As you realised, BM25 function is 
smoother.
You can configure schema.xml to use ClassicSimilarity, for instance 
https://lucene.apache.org/solr/guide/6_6/major-changes-from-solr-5-to-solr-6.html#default-similarity-changes
https://lucene.apache.org/solr/guide/6_6/field-type-definitions-and-properties.html#FieldTypeDefinitionsandProperties-FieldTypeSimilarity

But as said before, maybe you are using properties that are not guaranteed so 
it would be better to change score function or sorting (rather than coming back 
to ClassicSimilarity)



RE: SOLR Score Range Changed

2018-02-23 Thread Hodder, Rick
Hi Shawn,

Thanks for your help - I'm still finding my way in the weeds of SOLR.

Combining everything into one query is what I'd prefer because as you said, one 
would think that with everything in the same query, the score would organize 
everything nicely.

>>Assuming you're using the default relevancy sort
Yes

>> does the order of your search results change dramatically from one version 
>> to the other?  If it does, is the order generally better from a relevance 
>> standpoint, or generally worse?  If you are specifying an explicit sort, 
>> then the scores will likely be ignored.

Here's what we do - we have a list of policies with names (among other things, 
but I'll just use names for an example.

We search for several business names to see if we have policies in common with 
the names so that we don’t have too much risk with them.

So let's say I'm doing a search against three business names

Bob's carpentry
Conslidated carpentry of the Greater North West
Carpentry Land

q=(IDX_CompanyName:bob's AND carpentry) OR (IDX_CompanyName: conslidated AND 
carpentry AND of AND the AND Greater AND North AND West) OR (IDX_CompanyName: 
Carpentry AND Land)

Searching for 750 rows has hits that are all focused on Consolidated (seemingly 
because the number of words causes the SOLR score to go up into a higher range 
for all Consolidated results, as mentioned in my previous email.) Searching for 
all 3 things at the same time doesn’t insure that all 3 companies will be in 
the results, even when run separately there are results for all 3. If I boost 
maxrows to 4000, I see a few bob's carpentry but most are still Consolidated

So the way we had addressed it was running 3 separate SOLR queries and 
combining them and sorting them by descending score - wasn’t perfect, but it 
worked, and helped me to reduce the number of results we hand off to a scoring 
engine that applies 3 algorithms (Monge-Elkan, Jaro-Winkler, and SmithWindowed 
Affline) to further hone the results - which can take LOTS of time if there are 
a lot of results, so 


What I am describing is also why it's strongly recommended that you never try 
to convert scores to percentages:

https://wiki.apache.org/lucene-java/ScoresAsPercentages

Thanks,
Shawn



Re: SOLR Score Range Changed

2018-02-23 Thread Joël Trigalo
The difference seems due to the fact that default similarity in solr 7 is
BM25 while it used to be TF-IDF in solr 4. As you realised, BM25 function
is smoother.
You can configure schema.xml to use ClassicSimilarity, for instance
https://lucene.apache.org/solr/guide/6_6/major-changes-from-solr-5-to-solr-6.html#default-similarity-changes
https://lucene.apache.org/solr/guide/6_6/field-type-definitions-and-properties.html#FieldTypeDefinitionsandProperties-FieldTypeSimilarity

But as said before, maybe you are using properties that are not guaranteed
so it would be better to change score function or sorting (rather than
coming back to ClassicSimilarity)

2018-02-22 18:39 GMT+01:00 Shawn Heisey :

> On 2/22/2018 9:50 AM, Hodder, Rick wrote:
>
>> I am migrating from SOLR 4.10.2 to SOLR 7.1.
>>
>> All seems to be going well, except for one thing: the score that is
>> coming back for the resulting documents is giving different scores.
>>
>
> The absolute score has no meaning when you change something -- the index,
> the query, the software version, etc.  You can't compare absolute scores.
>
> What matters is the relative score of one document to another *in the same
> query*.  The amount of difference is almost irrelevant -- the goal of
> Lucene's score calculation gymnastics is to have one document score higher
> than another, so the *order* is reasonably correct.
>
> Assuming you're using the default relevancy sort, does the order of your
> search results change dramatically from one version to the other?  If it
> does, is the order generally better from a relevance standpoint, or
> generally worse?  If you are specifying an explicit sort, then the scores
> will likely be ignored.
>
> What I am describing is also why it's strongly recommended that you never
> try to convert scores to percentages:
>
> https://wiki.apache.org/lucene-java/ScoresAsPercentages
>
> Thanks,
> Shawn
>
>


Re: SOLR Score Range Changed

2018-02-22 Thread Shawn Heisey

On 2/22/2018 9:50 AM, Hodder, Rick wrote:

I am migrating from SOLR 4.10.2 to SOLR 7.1.

All seems to be going well, except for one thing: the score that is coming back 
for the resulting documents is giving different scores.


The absolute score has no meaning when you change something -- the 
index, the query, the software version, etc.  You can't compare absolute 
scores.


What matters is the relative score of one document to another *in the 
same query*.  The amount of difference is almost irrelevant -- the goal 
of Lucene's score calculation gymnastics is to have one document score 
higher than another, so the *order* is reasonably correct.


Assuming you're using the default relevancy sort, does the order of your 
search results change dramatically from one version to the other?  If it 
does, is the order generally better from a relevance standpoint, or 
generally worse?  If you are specifying an explicit sort, then the 
scores will likely be ignored.


What I am describing is also why it's strongly recommended that you 
never try to convert scores to percentages:


https://wiki.apache.org/lucene-java/ScoresAsPercentages

Thanks,
Shawn