RE: Solr 6 and IDF

2017-08-08 Thread Markus Jelsma
I agree! I also share the thought experiment, these changes seem justifiable 
indeed! It is just that i am interested in the evidence.

Again, if you can webster.homer, please share significant figures. They should 
be interesting!

Regards, M. 
 
-Original message-
> From:Walter Underwood <wun...@wunderwood.org>
> Sent: Tuesday 8th August 2017 23:41
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 6 and IDF
> 
> There are good use cases for disabling idf and even tf for labels and 
> categories.
> 
> Searching resumes, maybe you care that “microsoft word” is less selective 
> than “r programming”, but maybe you want all the ones that match three skills 
> followed by the ones that match two skills, regardless of how common those 
> skills are.
> 
> And for tf, a document tagged with both “new york” and “new york city” is not 
> twice as much about New York. Same for the movie “New York, New York”.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
> > On Aug 8, 2017, at 2:18 PM, Markus Jelsma <markus.jel...@openindex.io> 
> > wrote:
> > 
> > Do you measure MRR or sales conversion right now? It would be interesting 
> > to see the graph change after your modification, or not of course. Please 
> > let us know!
> > 
> > -Original message-
> >> From:Webster Homer <webster.ho...@sial.com>
> >> Sent: Tuesday 8th August 2017 23:04
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Solr 6 and IDF
> >> 
> >> I think just disabling idf is what we want. For product searching we really
> >> don't want to raise a rarer match. What we see analyzing results is that
> >> some good hits are suppressed, have lower scores, due to idf.
> >> 
> >> This is so we can test this. We think it will help, but we'll see.
> >> 
> >> On Tue, Aug 8, 2017 at 3:53 PM, Markus Jelsma <markus.jel...@openindex.io>
> >> wrote:
> >> 
> >>> Yes, extend the default Similarity, return 1.0f for idf and probably the
> >>> idfExplain methods, and configure it in your schema, global or per-field.
> >>> 
> >>> If you think this is a good idea, why not also return 1.0f for tf? And
> >>> while you're at it, also omitNorms on all fields entirely?
> >>> 
> >>> I am curious if this is going to help you, please let us know!
> >>> 
> >>> -Original message-
> >>>> From:Webster Homer <webster.ho...@sial.com>
> >>>> Sent: Tuesday 8th August 2017 22:44
> >>>> To: solr-user@lucene.apache.org
> >>>> Subject: Re: Solr 6 and IDF
> >>>> 
> >>>> It appears that all I need to do is create a class that
> >>>> extends BM25Similarity, and have the new class return 1 as the idf. Is
> >>> that
> >>>> correct?
> >>>> 
> >>>> On Tue, Aug 8, 2017 at 3:15 PM, Webster Homer <webster.ho...@sial.com>
> >>>> wrote:
> >>>> 
> >>>>> I do want to use BM25, just disable IDF
> >>>>> 
> >>>>> On Tue, Aug 8, 2017 at 2:58 PM, Peter Lancaster <
> >>>>> peter.lancas...@findmypast.com> wrote:
> >>>>> 
> >>>>>> Hi Webster,
> >>>>>> 
> >>>>>> If you're not worried about using BM25 searcher then you should just
> >>> be
> >>>>>> able to continue as you were before by providing your own similarity
> >>> class
> >>>>>> that extends ClassicSimilarity and then override the idf method to
> >>> always
> >>>>>> return 1,  then reference that in your schema
> >>>>>> e.g.
> >>>>>> 
> >>>>>> 
> >>>>>> As far as I know you've been able to have different similarities per
> >>>>>> field in solr for a while now. https://wiki.apache.org/solr/S
> >>>>>> chemaXml#Similarity
> >>>>>> 
> >>>>>> Cheers,
> >>>>>> Peter Lancaster.
> >>>>>> 
> >>>>>> 
> >>>>>> -Original Message-
> >>>>>> From: Webster Homer [mailto:webster.ho...@sial.com]
> >>>>>> Sent: 08 August 2017 20:39
> >>>>>> To: solr-user@lucene.apache.org
> >>>>>> Subject: Solr 6 and IDF
> >>&g

Re: Solr 6 and IDF

2017-08-08 Thread Walter Underwood
There are good use cases for disabling idf and even tf for labels and 
categories.

Searching resumes, maybe you care that “microsoft word” is less selective than 
“r programming”, but maybe you want all the ones that match three skills 
followed by the ones that match two skills, regardless of how common those 
skills are.

And for tf, a document tagged with both “new york” and “new york city” is not 
twice as much about New York. Same for the movie “New York, New York”.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Aug 8, 2017, at 2:18 PM, Markus Jelsma <markus.jel...@openindex.io> wrote:
> 
> Do you measure MRR or sales conversion right now? It would be interesting to 
> see the graph change after your modification, or not of course. Please let us 
> know!
> 
> -Original message-
>> From:Webster Homer <webster.ho...@sial.com>
>> Sent: Tuesday 8th August 2017 23:04
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr 6 and IDF
>> 
>> I think just disabling idf is what we want. For product searching we really
>> don't want to raise a rarer match. What we see analyzing results is that
>> some good hits are suppressed, have lower scores, due to idf.
>> 
>> This is so we can test this. We think it will help, but we'll see.
>> 
>> On Tue, Aug 8, 2017 at 3:53 PM, Markus Jelsma <markus.jel...@openindex.io>
>> wrote:
>> 
>>> Yes, extend the default Similarity, return 1.0f for idf and probably the
>>> idfExplain methods, and configure it in your schema, global or per-field.
>>> 
>>> If you think this is a good idea, why not also return 1.0f for tf? And
>>> while you're at it, also omitNorms on all fields entirely?
>>> 
>>> I am curious if this is going to help you, please let us know!
>>> 
>>> -Original message-
>>>> From:Webster Homer <webster.ho...@sial.com>
>>>> Sent: Tuesday 8th August 2017 22:44
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Re: Solr 6 and IDF
>>>> 
>>>> It appears that all I need to do is create a class that
>>>> extends BM25Similarity, and have the new class return 1 as the idf. Is
>>> that
>>>> correct?
>>>> 
>>>> On Tue, Aug 8, 2017 at 3:15 PM, Webster Homer <webster.ho...@sial.com>
>>>> wrote:
>>>> 
>>>>> I do want to use BM25, just disable IDF
>>>>> 
>>>>> On Tue, Aug 8, 2017 at 2:58 PM, Peter Lancaster <
>>>>> peter.lancas...@findmypast.com> wrote:
>>>>> 
>>>>>> Hi Webster,
>>>>>> 
>>>>>> If you're not worried about using BM25 searcher then you should just
>>> be
>>>>>> able to continue as you were before by providing your own similarity
>>> class
>>>>>> that extends ClassicSimilarity and then override the idf method to
>>> always
>>>>>> return 1,  then reference that in your schema
>>>>>> e.g.
>>>>>> 
>>>>>> 
>>>>>> As far as I know you've been able to have different similarities per
>>>>>> field in solr for a while now. https://wiki.apache.org/solr/S
>>>>>> chemaXml#Similarity
>>>>>> 
>>>>>> Cheers,
>>>>>> Peter Lancaster.
>>>>>> 
>>>>>> 
>>>>>> -Original Message-
>>>>>> From: Webster Homer [mailto:webster.ho...@sial.com]
>>>>>> Sent: 08 August 2017 20:39
>>>>>> To: solr-user@lucene.apache.org
>>>>>> Subject: Solr 6 and IDF
>>>>>> 
>>>>>> Our most common use for solr is searching for products, not text
>>> search.
>>>>>> My company is in the process of migrating away from an Endeca search
>>>>>> engine,  the goal to keep the business happy is to make sure that
>>> search
>>>>>> results from the different engines be fairly similar, one area that
>>> we have
>>>>>> found that suppresses a result from being as good as it was in the old
>>>>>> system is the idf.
>>>>>> 
>>>>>> We are using Solr 6. After moving to it, a lot of our results got
>>> better,
>>>>>> but idf still seems to deaden some results. Given that our focus is
>>> product
>>>>>> searching I really don't see a need for idf at all. Previous t

RE: Solr 6 and IDF

2017-08-08 Thread Markus Jelsma
Do you measure MRR or sales conversion right now? It would be interesting to 
see the graph change after your modification, or not of course. Please let us 
know!
 
-Original message-
> From:Webster Homer <webster.ho...@sial.com>
> Sent: Tuesday 8th August 2017 23:04
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 6 and IDF
> 
> I think just disabling idf is what we want. For product searching we really
> don't want to raise a rarer match. What we see analyzing results is that
> some good hits are suppressed, have lower scores, due to idf.
> 
> This is so we can test this. We think it will help, but we'll see.
> 
> On Tue, Aug 8, 2017 at 3:53 PM, Markus Jelsma <markus.jel...@openindex.io>
> wrote:
> 
> > Yes, extend the default Similarity, return 1.0f for idf and probably the
> > idfExplain methods, and configure it in your schema, global or per-field.
> >
> > If you think this is a good idea, why not also return 1.0f for tf? And
> > while you're at it, also omitNorms on all fields entirely?
> >
> > I am curious if this is going to help you, please let us know!
> >
> > -Original message-
> > > From:Webster Homer <webster.ho...@sial.com>
> > > Sent: Tuesday 8th August 2017 22:44
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Solr 6 and IDF
> > >
> > > It appears that all I need to do is create a class that
> > > extends BM25Similarity, and have the new class return 1 as the idf. Is
> > that
> > > correct?
> > >
> > > On Tue, Aug 8, 2017 at 3:15 PM, Webster Homer <webster.ho...@sial.com>
> > > wrote:
> > >
> > > > I do want to use BM25, just disable IDF
> > > >
> > > > On Tue, Aug 8, 2017 at 2:58 PM, Peter Lancaster <
> > > > peter.lancas...@findmypast.com> wrote:
> > > >
> > > >> Hi Webster,
> > > >>
> > > >> If you're not worried about using BM25 searcher then you should just
> > be
> > > >> able to continue as you were before by providing your own similarity
> > class
> > > >> that extends ClassicSimilarity and then override the idf method to
> > always
> > > >> return 1,  then reference that in your schema
> > > >> e.g.
> > > >> 
> > > >>
> > > >> As far as I know you've been able to have different similarities per
> > > >> field in solr for a while now. https://wiki.apache.org/solr/S
> > > >> chemaXml#Similarity
> > > >>
> > > >> Cheers,
> > > >> Peter Lancaster.
> > > >>
> > > >>
> > > >> -Original Message-
> > > >> From: Webster Homer [mailto:webster.ho...@sial.com]
> > > >> Sent: 08 August 2017 20:39
> > > >> To: solr-user@lucene.apache.org
> > > >> Subject: Solr 6 and IDF
> > > >>
> > > >> Our most common use for solr is searching for products, not text
> > search.
> > > >> My company is in the process of migrating away from an Endeca search
> > > >> engine,  the goal to keep the business happy is to make sure that
> > search
> > > >> results from the different engines be fairly similar, one area that
> > we have
> > > >> found that suppresses a result from being as good as it was in the old
> > > >> system is the idf.
> > > >>
> > > >> We are using Solr 6. After moving to it, a lot of our results got
> > better,
> > > >> but idf still seems to deaden some results. Given that our focus is
> > product
> > > >> searching I really don't see a need for idf at all. Previous to Solr
> > 6 you
> > > >> could suppress idf by providing a custom similarity class. Looking
> > over the
> > > >> newer documentation a lot of things have improved, but I'm not sure I
> > see a
> > > >> simple way to turn off idf in Solr 6's BM25 searcher.
> > > >>
> > > >> How do I disable IDF in Solr 6?
> > > >>
> > > >> We also do have needs for text searching so it would be nice if we
> > could
> > > >> suppress IDF on a field or schema level
> > > >>
> > > >> --
> > > >>
> > > >>
> > > >> This message and any attachment are confidential and may be
> > privileged or
> > > >> otherwi

Re: Solr 6 and IDF

2017-08-08 Thread Webster Homer
I think just disabling idf is what we want. For product searching we really
don't want to raise a rarer match. What we see analyzing results is that
some good hits are suppressed, have lower scores, due to idf.

This is so we can test this. We think it will help, but we'll see.

On Tue, Aug 8, 2017 at 3:53 PM, Markus Jelsma <markus.jel...@openindex.io>
wrote:

> Yes, extend the default Similarity, return 1.0f for idf and probably the
> idfExplain methods, and configure it in your schema, global or per-field.
>
> If you think this is a good idea, why not also return 1.0f for tf? And
> while you're at it, also omitNorms on all fields entirely?
>
> I am curious if this is going to help you, please let us know!
>
> -Original message-
> > From:Webster Homer <webster.ho...@sial.com>
> > Sent: Tuesday 8th August 2017 22:44
> > To: solr-user@lucene.apache.org
> > Subject: Re: Solr 6 and IDF
> >
> > It appears that all I need to do is create a class that
> > extends BM25Similarity, and have the new class return 1 as the idf. Is
> that
> > correct?
> >
> > On Tue, Aug 8, 2017 at 3:15 PM, Webster Homer <webster.ho...@sial.com>
> > wrote:
> >
> > > I do want to use BM25, just disable IDF
> > >
> > > On Tue, Aug 8, 2017 at 2:58 PM, Peter Lancaster <
> > > peter.lancas...@findmypast.com> wrote:
> > >
> > >> Hi Webster,
> > >>
> > >> If you're not worried about using BM25 searcher then you should just
> be
> > >> able to continue as you were before by providing your own similarity
> class
> > >> that extends ClassicSimilarity and then override the idf method to
> always
> > >> return 1,  then reference that in your schema
> > >> e.g.
> > >> 
> > >>
> > >> As far as I know you've been able to have different similarities per
> > >> field in solr for a while now. https://wiki.apache.org/solr/S
> > >> chemaXml#Similarity
> > >>
> > >> Cheers,
> > >> Peter Lancaster.
> > >>
> > >>
> > >> -Original Message-
> > >> From: Webster Homer [mailto:webster.ho...@sial.com]
> > >> Sent: 08 August 2017 20:39
> > >> To: solr-user@lucene.apache.org
> > >> Subject: Solr 6 and IDF
> > >>
> > >> Our most common use for solr is searching for products, not text
> search.
> > >> My company is in the process of migrating away from an Endeca search
> > >> engine,  the goal to keep the business happy is to make sure that
> search
> > >> results from the different engines be fairly similar, one area that
> we have
> > >> found that suppresses a result from being as good as it was in the old
> > >> system is the idf.
> > >>
> > >> We are using Solr 6. After moving to it, a lot of our results got
> better,
> > >> but idf still seems to deaden some results. Given that our focus is
> product
> > >> searching I really don't see a need for idf at all. Previous to Solr
> 6 you
> > >> could suppress idf by providing a custom similarity class. Looking
> over the
> > >> newer documentation a lot of things have improved, but I'm not sure I
> see a
> > >> simple way to turn off idf in Solr 6's BM25 searcher.
> > >>
> > >> How do I disable IDF in Solr 6?
> > >>
> > >> We also do have needs for text searching so it would be nice if we
> could
> > >> suppress IDF on a field or schema level
> > >>
> > >> --
> > >>
> > >>
> > >> This message and any attachment are confidential and may be
> privileged or
> > >> otherwise protected from disclosure. If you are not the intended
> recipient,
> > >> you must not copy this message or attachment or disclose the contents
> to
> > >> any other person. If you have received this transmission in error,
> please
> > >> notify the sender immediately and delete the message and any
> attachment
> > >> from your system. Merck KGaA, Darmstadt, Germany and any of its
> > >> subsidiaries do not accept liability for any omissions or errors in
> this
> > >> message which may arise as a result of E-Mail-transmission or for
> damages
> > >> resulting from any unauthorized changes of the content of this
> message and
> > >> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > >> subsidiaries do not 

RE: Solr 6 and IDF

2017-08-08 Thread Markus Jelsma
Yes, extend the default Similarity, return 1.0f for idf and probably the 
idfExplain methods, and configure it in your schema, global or per-field.

If you think this is a good idea, why not also return 1.0f for tf? And while 
you're at it, also omitNorms on all fields entirely?

I am curious if this is going to help you, please let us know!
 
-Original message-
> From:Webster Homer <webster.ho...@sial.com>
> Sent: Tuesday 8th August 2017 22:44
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 6 and IDF
> 
> It appears that all I need to do is create a class that
> extends BM25Similarity, and have the new class return 1 as the idf. Is that
> correct?
> 
> On Tue, Aug 8, 2017 at 3:15 PM, Webster Homer <webster.ho...@sial.com>
> wrote:
> 
> > I do want to use BM25, just disable IDF
> >
> > On Tue, Aug 8, 2017 at 2:58 PM, Peter Lancaster <
> > peter.lancas...@findmypast.com> wrote:
> >
> >> Hi Webster,
> >>
> >> If you're not worried about using BM25 searcher then you should just be
> >> able to continue as you were before by providing your own similarity class
> >> that extends ClassicSimilarity and then override the idf method to always
> >> return 1,  then reference that in your schema
> >> e.g.
> >> 
> >>
> >> As far as I know you've been able to have different similarities per
> >> field in solr for a while now. https://wiki.apache.org/solr/S
> >> chemaXml#Similarity
> >>
> >> Cheers,
> >> Peter Lancaster.
> >>
> >>
> >> -Original Message-
> >> From: Webster Homer [mailto:webster.ho...@sial.com]
> >> Sent: 08 August 2017 20:39
> >> To: solr-user@lucene.apache.org
> >> Subject: Solr 6 and IDF
> >>
> >> Our most common use for solr is searching for products, not text search.
> >> My company is in the process of migrating away from an Endeca search
> >> engine,  the goal to keep the business happy is to make sure that search
> >> results from the different engines be fairly similar, one area that we have
> >> found that suppresses a result from being as good as it was in the old
> >> system is the idf.
> >>
> >> We are using Solr 6. After moving to it, a lot of our results got better,
> >> but idf still seems to deaden some results. Given that our focus is product
> >> searching I really don't see a need for idf at all. Previous to Solr 6 you
> >> could suppress idf by providing a custom similarity class. Looking over the
> >> newer documentation a lot of things have improved, but I'm not sure I see a
> >> simple way to turn off idf in Solr 6's BM25 searcher.
> >>
> >> How do I disable IDF in Solr 6?
> >>
> >> We also do have needs for text searching so it would be nice if we could
> >> suppress IDF on a field or schema level
> >>
> >> --
> >>
> >>
> >> This message and any attachment are confidential and may be privileged or
> >> otherwise protected from disclosure. If you are not the intended recipient,
> >> you must not copy this message or attachment or disclose the contents to
> >> any other person. If you have received this transmission in error, please
> >> notify the sender immediately and delete the message and any attachment
> >> from your system. Merck KGaA, Darmstadt, Germany and any of its
> >> subsidiaries do not accept liability for any omissions or errors in this
> >> message which may arise as a result of E-Mail-transmission or for damages
> >> resulting from any unauthorized changes of the content of this message and
> >> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> >> subsidiaries do not guarantee that this message is free of viruses and does
> >> not accept liability for any damages caused by any virus transmitted
> >> therewith.
> >>
> >> Click http://www.emdgroup.com/disclaimer to access the German, French,
> >> Spanish and Portuguese versions of this disclaimer.
> >> 
> >>
> >> This message is confidential and may contain privileged information. You
> >> should not disclose its contents to any other person. If you are not the
> >> intended recipient, please notify the sender named above immediately. It is
> >> expressly declared that this e-mail does not constitute nor form part of a
> >> contract or unilateral obligation. Opinions, conclusions and other
> >> information in this message that do not relate to the o

Re: Solr 6 and IDF

2017-08-08 Thread Webster Homer
It appears that all I need to do is create a class that
extends BM25Similarity, and have the new class return 1 as the idf. Is that
correct?

On Tue, Aug 8, 2017 at 3:15 PM, Webster Homer <webster.ho...@sial.com>
wrote:

> I do want to use BM25, just disable IDF
>
> On Tue, Aug 8, 2017 at 2:58 PM, Peter Lancaster <
> peter.lancas...@findmypast.com> wrote:
>
>> Hi Webster,
>>
>> If you're not worried about using BM25 searcher then you should just be
>> able to continue as you were before by providing your own similarity class
>> that extends ClassicSimilarity and then override the idf method to always
>> return 1,  then reference that in your schema
>> e.g.
>> 
>>
>> As far as I know you've been able to have different similarities per
>> field in solr for a while now. https://wiki.apache.org/solr/S
>> chemaXml#Similarity
>>
>> Cheers,
>> Peter Lancaster.
>>
>>
>> -Original Message-----
>> From: Webster Homer [mailto:webster.ho...@sial.com]
>> Sent: 08 August 2017 20:39
>> To: solr-user@lucene.apache.org
>> Subject: Solr 6 and IDF
>>
>> Our most common use for solr is searching for products, not text search.
>> My company is in the process of migrating away from an Endeca search
>> engine,  the goal to keep the business happy is to make sure that search
>> results from the different engines be fairly similar, one area that we have
>> found that suppresses a result from being as good as it was in the old
>> system is the idf.
>>
>> We are using Solr 6. After moving to it, a lot of our results got better,
>> but idf still seems to deaden some results. Given that our focus is product
>> searching I really don't see a need for idf at all. Previous to Solr 6 you
>> could suppress idf by providing a custom similarity class. Looking over the
>> newer documentation a lot of things have improved, but I'm not sure I see a
>> simple way to turn off idf in Solr 6's BM25 searcher.
>>
>> How do I disable IDF in Solr 6?
>>
>> We also do have needs for text searching so it would be nice if we could
>> suppress IDF on a field or schema level
>>
>> --
>>
>>
>> This message and any attachment are confidential and may be privileged or
>> otherwise protected from disclosure. If you are not the intended recipient,
>> you must not copy this message or attachment or disclose the contents to
>> any other person. If you have received this transmission in error, please
>> notify the sender immediately and delete the message and any attachment
>> from your system. Merck KGaA, Darmstadt, Germany and any of its
>> subsidiaries do not accept liability for any omissions or errors in this
>> message which may arise as a result of E-Mail-transmission or for damages
>> resulting from any unauthorized changes of the content of this message and
>> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
>> subsidiaries do not guarantee that this message is free of viruses and does
>> not accept liability for any damages caused by any virus transmitted
>> therewith.
>>
>> Click http://www.emdgroup.com/disclaimer to access the German, French,
>> Spanish and Portuguese versions of this disclaimer.
>> 
>>
>> This message is confidential and may contain privileged information. You
>> should not disclose its contents to any other person. If you are not the
>> intended recipient, please notify the sender named above immediately. It is
>> expressly declared that this e-mail does not constitute nor form part of a
>> contract or unilateral obligation. Opinions, conclusions and other
>> information in this message that do not relate to the official business of
>> findmypast shall be understood as neither given nor endorsed by it.
>> 
>>
>> 
>> __
>>
>> This email has been checked for virus and other malicious content prior
>> to leaving our network.
>> 
>> __
>>
>
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not a

Re: Solr 6 and IDF

2017-08-08 Thread Webster Homer
I do want to use BM25, just disable IDF

On Tue, Aug 8, 2017 at 2:58 PM, Peter Lancaster <
peter.lancas...@findmypast.com> wrote:

> Hi Webster,
>
> If you're not worried about using BM25 searcher then you should just be
> able to continue as you were before by providing your own similarity class
> that extends ClassicSimilarity and then override the idf method to always
> return 1,  then reference that in your schema
> e.g.
> 
>
> As far as I know you've been able to have different similarities per field
> in solr for a while now. https://wiki.apache.org/solr/SchemaXml#Similarity
>
> Cheers,
> Peter Lancaster.
>
>
> -Original Message-
> From: Webster Homer [mailto:webster.ho...@sial.com]
> Sent: 08 August 2017 20:39
> To: solr-user@lucene.apache.org
> Subject: Solr 6 and IDF
>
> Our most common use for solr is searching for products, not text search.
> My company is in the process of migrating away from an Endeca search
> engine,  the goal to keep the business happy is to make sure that search
> results from the different engines be fairly similar, one area that we have
> found that suppresses a result from being as good as it was in the old
> system is the idf.
>
> We are using Solr 6. After moving to it, a lot of our results got better,
> but idf still seems to deaden some results. Given that our focus is product
> searching I really don't see a need for idf at all. Previous to Solr 6 you
> could suppress idf by providing a custom similarity class. Looking over the
> newer documentation a lot of things have improved, but I'm not sure I see a
> simple way to turn off idf in Solr 6's BM25 searcher.
>
> How do I disable IDF in Solr 6?
>
> We also do have needs for text searching so it would be nice if we could
> suppress IDF on a field or schema level
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
> 
>
> This message is confidential and may contain privileged information. You
> should not disclose its contents to any other person. If you are not the
> intended recipient, please notify the sender named above immediately. It is
> expressly declared that this e-mail does not constitute nor form part of a
> contract or unilateral obligation. Opinions, conclusions and other
> information in this message that do not relate to the official business of
> findmypast shall be understood as neither given nor endorsed by it.
> 
>
> __
>
> This email has been checked for virus and other malicious content prior to
> leaving our network.
> __
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


RE: Solr 6 and IDF

2017-08-08 Thread Peter Lancaster
Hi Webster,

If you're not worried about using BM25 searcher then you should just be able to 
continue as you were before by providing your own similarity class that extends 
ClassicSimilarity and then override the idf method to always return 1,  then 
reference that in your schema
e.g.


As far as I know you've been able to have different similarities per field in 
solr for a while now. https://wiki.apache.org/solr/SchemaXml#Similarity

Cheers,
Peter Lancaster.


-Original Message-
From: Webster Homer [mailto:webster.ho...@sial.com]
Sent: 08 August 2017 20:39
To: solr-user@lucene.apache.org
Subject: Solr 6 and IDF

Our most common use for solr is searching for products, not text search. My 
company is in the process of migrating away from an Endeca search engine,  the 
goal to keep the business happy is to make sure that search results from the 
different engines be fairly similar, one area that we have found that 
suppresses a result from being as good as it was in the old system is the idf.

We are using Solr 6. After moving to it, a lot of our results got better, but 
idf still seems to deaden some results. Given that our focus is product 
searching I really don't see a need for idf at all. Previous to Solr 6 you 
could suppress idf by providing a custom similarity class. Looking over the 
newer documentation a lot of things have improved, but I'm not sure I see a 
simple way to turn off idf in Solr 6's BM25 searcher.

How do I disable IDF in Solr 6?

We also do have needs for text searching so it would be nice if we could 
suppress IDF on a field or schema level

--


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, you 
must not copy this message or attachment or disclose the contents to any other 
person. If you have received this transmission in error, please notify the 
sender immediately and delete the message and any attachment from your system. 
Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept 
liability for any omissions or errors in this message which may arise as a 
result of E-Mail-transmission or for damages resulting from any unauthorized 
changes of the content of this message and any attachment thereto. Merck KGaA, 
Darmstadt, Germany and any of its subsidiaries do not guarantee that this 
message is free of viruses and does not accept liability for any damages caused 
by any virus transmitted therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish 
and Portuguese versions of this disclaimer.


This message is confidential and may contain privileged information. You should 
not disclose its contents to any other person. If you are not the intended 
recipient, please notify the sender named above immediately. It is expressly 
declared that this e-mail does not constitute nor form part of a contract or 
unilateral obligation. Opinions, conclusions and other information in this 
message that do not relate to the official business of findmypast shall be 
understood as neither given nor endorsed by it.


__

This email has been checked for virus and other malicious content prior to 
leaving our network.
__


Solr 6 and IDF

2017-08-08 Thread Webster Homer
Our most common use for solr is searching for products, not text search. My
company is in the process of migrating away from an Endeca search engine,
 the goal to keep the business happy is to make sure that search results
from the different engines be fairly similar, one area that we have found
that suppresses a result from being as good as it was in the old system is
the idf.

We are using Solr 6. After moving to it, a lot of our results got better,
but idf still seems to deaden some results. Given that our focus is product
searching I really don't see a need for idf at all. Previous to Solr 6 you
could suppress idf by providing a custom similarity class. Looking over the
newer documentation a lot of things have improved, but I'm not sure I see a
simple way to turn off idf in Solr 6's BM25 searcher.

How do I disable IDF in Solr 6?

We also do have needs for text searching so it would be nice if we could
suppress IDF on a field or schema level

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.