Re: optimize boosting parameters

2020-12-08 Thread Derek Poh
We monitor the response time (pingdom) of the page that uses these 
boosting parameters. Since the addition of these boosting parameters and 
an additional field to search on (which I will create a thread on it in 
the mailing list), the page average response time has increased by 1-2 
seconds.

Management has feedback on this.


If it does turn out to be the boosting (and IIRC the
map function can be expensive), can you pre-compute some
number of the boosts? Your requirements look
like they can be computed at index time, then boost
by just the value of the pre-computed field.
I have gone through the list of functions and map function is the only 
one that can meet the requirements.

Or is there a less expensive function that I missed out?

By pre-compute some number, do you mean before the indexing at 
preparation stage, check the value of P_SupplierResponseRate. If the 
value = 3, specify 'boost="0.4"' for the field of the document?



BTW, boosts < 1.0
_reduce_ the score. I mention that just in case that’s a surprise ;)
Oh it is to reduce the score?! Not increase (multiply or add) the score 
by less than 1?



  You use termfreq, which changes of course, but
1> if your corpus is updated often enough, the termfreqs will be relatively 
stable.
   in that case you can pre-compute them too.
We do incremental indexing every half an hour on this collection. 
Average of 50K-100K documents during each indexing. Collection has 7+ 
milliion documents.

So the entire corpus does not get updated in every indexing.


2> your problem statement has nothing to do with termfreq so why are you
  using it in the first place?
I read up on termfreq function again. It returns the number of times the 
term appears in the field for that document. It does not really fit the 
requirements. Thank you for pointing it out.

I should use map instead?

Derek

On 8/12/2020 9:48 pm, Erick Erickson wrote:

Before worrying about it too much, exactly _how_ much has
the performance changed?

I’ve just been in too many situations where there’s
no objective measure of performance before and after, just
someone saying “it seems slower” and had those performance
changes disappear when a rigorous test is done. Then spent
a lot of time figuring out that the person reporting the
problem hadn’t had coffee yet. Or the network was slow.
Or….

If it does turn out to be the boosting (and IIRC the
map function can be expensive), can you pre-compute some
number of the boosts? Your requirements look
like they can be computed at index time, then boost
by just the value of the pre-computed field. BTW, boosts < 1.0
_reduce_ the score. I mention that just in case that’s a surprise ;)
Of course that means that to change the boosting you need
to re-index.

  You use termfreq, which changes of course, but
1> if your corpus is updated often enough, the termfreqs will be relatively 
stable.
   in that case you can pre-compute them too.


2> your problem statement has nothing to do with termfreq so why are you
  using it in the first place?

Best,
Erick


On Dec 8, 2020, at 12:46 AM, Radu Gheorghe  wrote:

Hi Derek,

Ah, then my reply was completely off :)

I don’t really see a better way. Maybe other than changing termfreq to field, 
if the numeric field has docValues? That may be faster, but I don’t know for 
sure.

Best regards,
Radu
--
Sematext Cloud - Full Stack Observability - https://sematext.com
Solr and Elasticsearch Consulting, Training and Production Support


On 8 Dec 2020, at 06:17, Derek Poh  wrote:

Hi Radu

Apologies for not making myself clear.

I would like to know if there is a more simple or efficient way to craft the 
boosting parameters based on the requirements.

For example, I am using 'if', 'map' and 'termfreq' functions in the bf 
parameters.

Is there a more efficient or simple function that can be use instead? Or craft 
the 'formula' it in a more efficient way?

On 7/12/2020 10:05 pm, Radu Gheorghe wrote:

Hi Derek,

It’s hard to tell whether your boosts can be made better without knowing your 
data and what users expect of it. Which is a problem in itself.

I would suggest gathering judgements, like if a user queries for X, what doc 
IDs do you expect to get back?

Once you have enough of these judgements, you can experiment with boosts and 
see how the query results change. There are measures such as nDCG (
https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG
) that can help you measure that per query, and you can average this score 
across all your judgements to get an overall measure of how well you’re doing.

Or even better, you can have something like Quaerite play with boost values for 
you:

https://github.com/tballison/quaerite/blob/main/quaerite-examples/README.md#genetic-algorithms-ga-runga


Best regards,
Radu
--
Sematext Cloud - Full Stack Observability -
https://sematext.com

Solr and Elasticsearch Consulting, Training and Production Support



On 7 Dec 2020, at 10:51, Derek Poh 
wrote:


Re: optimize boosting parameters

2020-12-08 Thread Erick Erickson
Before worrying about it too much, exactly _how_ much has
the performance changed?

I’ve just been in too many situations where there’s
no objective measure of performance before and after, just
someone saying “it seems slower” and had those performance
changes disappear when a rigorous test is done. Then spent
a lot of time figuring out that the person reporting the 
problem hadn’t had coffee yet. Or the network was slow.
Or….

If it does turn out to be the boosting (and IIRC the
map function can be expensive), can you pre-compute some
number of the boosts? Your requirements look
like they can be computed at index time, then boost
by just the value of the pre-computed field. BTW, boosts < 1.0
_reduce_ the score. I mention that just in case that’s a surprise ;)
Of course that means that to change the boosting you need
to re-index.

 You use termfreq, which changes of course, but
1> if your corpus is updated often enough, the termfreqs will be relatively 
stable.
  in that case you can pre-compute them too.


2> your problem statement has nothing to do with termfreq so why are you
 using it in the first place?

Best,
Erick

> On Dec 8, 2020, at 12:46 AM, Radu Gheorghe  wrote:
> 
> Hi Derek,
> 
> Ah, then my reply was completely off :)
> 
> I don’t really see a better way. Maybe other than changing termfreq to field, 
> if the numeric field has docValues? That may be faster, but I don’t know for 
> sure.
> 
> Best regards,
> Radu
> --
> Sematext Cloud - Full Stack Observability - https://sematext.com
> Solr and Elasticsearch Consulting, Training and Production Support
> 
>> On 8 Dec 2020, at 06:17, Derek Poh  wrote:
>> 
>> Hi Radu
>> 
>> Apologies for not making myself clear.
>> 
>> I would like to know if there is a more simple or efficient way to craft the 
>> boosting parameters based on the requirements.
>> 
>> For example, I am using 'if', 'map' and 'termfreq' functions in the bf 
>> parameters.
>> 
>> Is there a more efficient or simple function that can be use instead? Or 
>> craft the 'formula' it in a more efficient way?
>> 
>> On 7/12/2020 10:05 pm, Radu Gheorghe wrote:
>>> Hi Derek,
>>> 
>>> It’s hard to tell whether your boosts can be made better without knowing 
>>> your data and what users expect of it. Which is a problem in itself.
>>> 
>>> I would suggest gathering judgements, like if a user queries for X, what 
>>> doc IDs do you expect to get back?
>>> 
>>> Once you have enough of these judgements, you can experiment with boosts 
>>> and see how the query results change. There are measures such as nDCG (
>>> https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG
>>> ) that can help you measure that per query, and you can average this score 
>>> across all your judgements to get an overall measure of how well you’re 
>>> doing.
>>> 
>>> Or even better, you can have something like Quaerite play with boost values 
>>> for you:
>>> 
>>> https://github.com/tballison/quaerite/blob/main/quaerite-examples/README.md#genetic-algorithms-ga-runga
>>> 
>>> 
>>> Best regards,
>>> Radu
>>> --
>>> Sematext Cloud - Full Stack Observability - 
>>> https://sematext.com
>>> 
>>> Solr and Elasticsearch Consulting, Training and Production Support
>>> 
>>> 
 On 7 Dec 2020, at 10:51, Derek Poh 
 wrote:
 
 Hi
 
 I have added the following boosting requirements to the search query of a 
 page. Feedback from monitoring team is that the overall response of the 
 page has increased since then.
 I am trying to find out if the added boosting parameters (below) could 
 have contributed to the increased.
 
 The boosting is working as per requirements.
 
 May I know if the implemented boosting parameters can be enhanced or 
 optimized further?
 Hopefully to improve on the response time of the query and the page.
 
 Requirements:
 1. If P_SupplierResponseRate is:
   a. 3, boost by 0.4
   b. 2, boost by 0.2
 
 2. If P_SupplierResponseTime is:
   a. 4, boost by 0.4
   b. 3, boost by 0.2
 
 3. If P_MWSScore is:
   a. between 80-100, boost by 1.6
   b. between 60-79, boost by 0.8
 
 4. If P_SupplierRanking is:
   a. 3, boost by 0.3
   b. 4, boost by 0.6
   c. 5, boost by 0.9
   b. 6, boost by 1.2
 
 Boosting parameters implemented:
 bf=map(P_SupplierResponseRate,3,3,0.4,0)
 bf=map(P_SupplierResponseRate,2,2,0.2,0)
 
 bf=map(P_SupplierResponseTime,4,4,0.4,0)
 bf=map(P_SupplierResponseTime,3,3,0.2,0)
 
 bf=map(P_MWSScore,80,100,1.6,0)
 bf=map(P_MWSScore,60,79,0.8,0)
 
 bf=if(termfreq(P_SupplierRanking,3),0.3,if(termfreq(P_SupplierRanking,4),0.6,if(termfreq(P_SupplierRanking,5),0.9,if(termfreq(P_SupplierRanking,6),1.2,0
 
 
 I am using Solr 7.7.2
 
 --
 CONFIDENTIALITY NOTICE 
 This e-mail (including any attachments) may contain confidential and/or 
 privileged 

Re: optimize boosting parameters

2020-12-07 Thread Radu Gheorghe
Hi Derek,

Ah, then my reply was completely off :)

I don’t really see a better way. Maybe other than changing termfreq to field, 
if the numeric field has docValues? That may be faster, but I don’t know for 
sure.

Best regards,
Radu
--
Sematext Cloud - Full Stack Observability - https://sematext.com
Solr and Elasticsearch Consulting, Training and Production Support

> On 8 Dec 2020, at 06:17, Derek Poh  wrote:
> 
> Hi Radu
> 
> Apologies for not making myself clear.
> 
> I would like to know if there is a more simple or efficient way to craft the 
> boosting parameters based on the requirements.
> 
> For example, I am using 'if', 'map' and 'termfreq' functions in the bf 
> parameters.
> 
> Is there a more efficient or simple function that can be use instead? Or 
> craft the 'formula' it in a more efficient way?
> 
> On 7/12/2020 10:05 pm, Radu Gheorghe wrote:
>> Hi Derek,
>> 
>> It’s hard to tell whether your boosts can be made better without knowing 
>> your data and what users expect of it. Which is a problem in itself.
>> 
>> I would suggest gathering judgements, like if a user queries for X, what doc 
>> IDs do you expect to get back?
>> 
>> Once you have enough of these judgements, you can experiment with boosts and 
>> see how the query results change. There are measures such as nDCG (
>> https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG
>> ) that can help you measure that per query, and you can average this score 
>> across all your judgements to get an overall measure of how well you’re 
>> doing.
>> 
>> Or even better, you can have something like Quaerite play with boost values 
>> for you:
>> 
>> https://github.com/tballison/quaerite/blob/main/quaerite-examples/README.md#genetic-algorithms-ga-runga
>> 
>> 
>> Best regards,
>> Radu
>> --
>> Sematext Cloud - Full Stack Observability - 
>> https://sematext.com
>> 
>> Solr and Elasticsearch Consulting, Training and Production Support
>> 
>> 
>>> On 7 Dec 2020, at 10:51, Derek Poh 
>>>  wrote:
>>> 
>>> Hi
>>> 
>>> I have added the following boosting requirements to the search query of a 
>>> page. Feedback from monitoring team is that the overall response of the 
>>> page has increased since then.
>>> I am trying to find out if the added boosting parameters (below) could have 
>>> contributed to the increased.
>>> 
>>> The boosting is working as per requirements.
>>> 
>>> May I know if the implemented boosting parameters can be enhanced or 
>>> optimized further?
>>> Hopefully to improve on the response time of the query and the page.
>>> 
>>> Requirements:
>>> 1. If P_SupplierResponseRate is:
>>>a. 3, boost by 0.4
>>>b. 2, boost by 0.2
>>> 
>>> 2. If P_SupplierResponseTime is:
>>>a. 4, boost by 0.4
>>>b. 3, boost by 0.2
>>> 
>>> 3. If P_MWSScore is:
>>>a. between 80-100, boost by 1.6
>>>b. between 60-79, boost by 0.8
>>> 
>>> 4. If P_SupplierRanking is:
>>>a. 3, boost by 0.3
>>>b. 4, boost by 0.6
>>>c. 5, boost by 0.9
>>>b. 6, boost by 1.2
>>> 
>>> Boosting parameters implemented:
>>> bf=map(P_SupplierResponseRate,3,3,0.4,0)
>>> bf=map(P_SupplierResponseRate,2,2,0.2,0)
>>> 
>>> bf=map(P_SupplierResponseTime,4,4,0.4,0)
>>> bf=map(P_SupplierResponseTime,3,3,0.2,0)
>>> 
>>> bf=map(P_MWSScore,80,100,1.6,0)
>>> bf=map(P_MWSScore,60,79,0.8,0)
>>> 
>>> bf=if(termfreq(P_SupplierRanking,3),0.3,if(termfreq(P_SupplierRanking,4),0.6,if(termfreq(P_SupplierRanking,5),0.9,if(termfreq(P_SupplierRanking,6),1.2,0
>>> 
>>> 
>>> I am using Solr 7.7.2
>>> 
>>> --
>>> CONFIDENTIALITY NOTICE 
>>> This e-mail (including any attachments) may contain confidential and/or 
>>> privileged information. If you are not the intended recipient or have 
>>> received this e-mail in error, please inform the sender immediately and 
>>> delete this e-mail (including any attachments) from your computer, and you 
>>> must not use, disclose to anyone else or copy this e-mail (including any 
>>> attachments), whether in whole or in part. 
>>> This e-mail and any reply to it may be monitored for security, legal, 
>>> regulatory compliance and/or other appropriate reasons.
>>> 
>>> 
>> 
> 
> 
> 
> 
> 
> -- 
> CONFIDENTIALITY NOTICE 
> 
> This e-mail (including any attachments) may contain confidential and/or 
> privileged information. If you are not the intended recipient or have 
> received this e-mail in error, please inform the sender immediately and 
> delete this e-mail (including any attachments) from your computer, and you 
> must not use, disclose to anyone else or copy this e-mail (including any 
> attachments), whether in whole or in part. 
> 
> This e-mail and any reply to it may be monitored for security, legal, 
> regulatory compliance and/or other appropriate reasons.
> 
> 



Re: optimize boosting parameters

2020-12-07 Thread Derek Poh

Hi Radu

Apologies for not making myself clear.

I would like to know if there is a more simple or efficient way to craft 
the boosting parameters based on the requirements.


For example, I am using 'if', 'map' and 'termfreq' functions in the bf 
parameters.


Is there a more efficient or simple function that can be use instead? Or 
craft the 'formula' it in a more efficient way?


On 7/12/2020 10:05 pm, Radu Gheorghe wrote:

Hi Derek,

It’s hard to tell whether your boosts can be made better without knowing your 
data and what users expect of it. Which is a problem in itself.

I would suggest gathering judgements, like if a user queries for X, what doc 
IDs do you expect to get back?

Once you have enough of these judgements, you can experiment with boosts and 
see how the query results change. There are measures such as nDCG 
(https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG) that 
can help you measure that per query, and you can average this score across all 
your judgements to get an overall measure of how well you’re doing.

Or even better, you can have something like Quaerite play with boost values for 
you:
https://github.com/tballison/quaerite/blob/main/quaerite-examples/README.md#genetic-algorithms-ga-runga

Best regards,
Radu
--
Sematext Cloud - Full Stack Observability - https://sematext.com
Solr and Elasticsearch Consulting, Training and Production Support


On 7 Dec 2020, at 10:51, Derek Poh  wrote:

Hi

I have added the following boosting requirements to the search query of a page. 
Feedback from monitoring team is that the overall response of the page has 
increased since then.
I am trying to find out if the added boosting parameters (below) could have 
contributed to the increased.

The boosting is working as per requirements.

May I know if the implemented boosting parameters can be enhanced or optimized 
further?
Hopefully to improve on the response time of the query and the page.

Requirements:
1. If P_SupplierResponseRate is:
a. 3, boost by 0.4
b. 2, boost by 0.2

2. If P_SupplierResponseTime is:
a. 4, boost by 0.4
b. 3, boost by 0.2

3. If P_MWSScore is:
a. between 80-100, boost by 1.6
b. between 60-79, boost by 0.8

4. If P_SupplierRanking is:
a. 3, boost by 0.3
b. 4, boost by 0.6
c. 5, boost by 0.9
b. 6, boost by 1.2

Boosting parameters implemented:
bf=map(P_SupplierResponseRate,3,3,0.4,0)
bf=map(P_SupplierResponseRate,2,2,0.2,0)

bf=map(P_SupplierResponseTime,4,4,0.4,0)
bf=map(P_SupplierResponseTime,3,3,0.2,0)

bf=map(P_MWSScore,80,100,1.6,0)
bf=map(P_MWSScore,60,79,0.8,0)

bf=if(termfreq(P_SupplierRanking,3),0.3,if(termfreq(P_SupplierRanking,4),0.6,if(termfreq(P_SupplierRanking,5),0.9,if(termfreq(P_SupplierRanking,6),1.2,0


I am using Solr 7.7.2

--
CONFIDENTIALITY NOTICE
This e-mail (including any attachments) may contain confidential and/or 
privileged information. If you are not the intended recipient or have received 
this e-mail in error, please inform the sender immediately and delete this 
e-mail (including any attachments) from your computer, and you must not use, 
disclose to anyone else or copy this e-mail (including any attachments), 
whether in whole or in part.
This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.






--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 


This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.

Re: optimize boosting parameters

2020-12-07 Thread Radu Gheorghe
Hi Derek,

It’s hard to tell whether your boosts can be made better without knowing your 
data and what users expect of it. Which is a problem in itself.

I would suggest gathering judgements, like if a user queries for X, what doc 
IDs do you expect to get back?

Once you have enough of these judgements, you can experiment with boosts and 
see how the query results change. There are measures such as nDCG 
(https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG) that 
can help you measure that per query, and you can average this score across all 
your judgements to get an overall measure of how well you’re doing.

Or even better, you can have something like Quaerite play with boost values for 
you:
https://github.com/tballison/quaerite/blob/main/quaerite-examples/README.md#genetic-algorithms-ga-runga

Best regards,
Radu
--
Sematext Cloud - Full Stack Observability - https://sematext.com
Solr and Elasticsearch Consulting, Training and Production Support

> On 7 Dec 2020, at 10:51, Derek Poh  wrote:
> 
> Hi
> 
> I have added the following boosting requirements to the search query of a 
> page. Feedback from monitoring team is that the overall response of the page 
> has increased since then.
> I am trying to find out if the added boosting parameters (below) could have 
> contributed to the increased.
> 
> The boosting is working as per requirements.
> 
> May I know if the implemented boosting parameters can be enhanced or 
> optimized further?
> Hopefully to improve on the response time of the query and the page.
> 
> Requirements:
> 1. If P_SupplierResponseRate is:
>a. 3, boost by 0.4
>b. 2, boost by 0.2
> 
> 2. If P_SupplierResponseTime is:
>a. 4, boost by 0.4
>b. 3, boost by 0.2
> 
> 3. If P_MWSScore is:
>a. between 80-100, boost by 1.6
>b. between 60-79, boost by 0.8
> 
> 4. If P_SupplierRanking is:
>a. 3, boost by 0.3
>b. 4, boost by 0.6
>c. 5, boost by 0.9
>b. 6, boost by 1.2
> 
> Boosting parameters implemented:
> bf=map(P_SupplierResponseRate,3,3,0.4,0)
> bf=map(P_SupplierResponseRate,2,2,0.2,0)
> 
> bf=map(P_SupplierResponseTime,4,4,0.4,0)
> bf=map(P_SupplierResponseTime,3,3,0.2,0)
> 
> bf=map(P_MWSScore,80,100,1.6,0)
> bf=map(P_MWSScore,60,79,0.8,0)
> 
> bf=if(termfreq(P_SupplierRanking,3),0.3,if(termfreq(P_SupplierRanking,4),0.6,if(termfreq(P_SupplierRanking,5),0.9,if(termfreq(P_SupplierRanking,6),1.2,0
> 
> 
> I am using Solr 7.7.2
> 
> --
> CONFIDENTIALITY NOTICE 
> This e-mail (including any attachments) may contain confidential and/or 
> privileged information. If you are not the intended recipient or have 
> received this e-mail in error, please inform the sender immediately and 
> delete this e-mail (including any attachments) from your computer, and you 
> must not use, disclose to anyone else or copy this e-mail (including any 
> attachments), whether in whole or in part. 
> This e-mail and any reply to it may be monitored for security, legal, 
> regulatory compliance and/or other appropriate reasons.
>