Re: optimize boosting parameters
We monitor the response time (pingdom) of the page that uses these boosting parameters. Since the addition of these boosting parameters and an additional field to search on (which I will create a thread on it in the mailing list), the page average response time has increased by 1-2 seconds. Management has feedback on this. If it does turn out to be the boosting (and IIRC the map function can be expensive), can you pre-compute some number of the boosts? Your requirements look like they can be computed at index time, then boost by just the value of the pre-computed field. I have gone through the list of functions and map function is the only one that can meet the requirements. Or is there a less expensive function that I missed out? By pre-compute some number, do you mean before the indexing at preparation stage, check the value of P_SupplierResponseRate. If the value = 3, specify 'boost="0.4"' for the field of the document? BTW, boosts < 1.0 _reduce_ the score. I mention that just in case that’s a surprise ;) Oh it is to reduce the score?! Not increase (multiply or add) the score by less than 1? You use termfreq, which changes of course, but 1> if your corpus is updated often enough, the termfreqs will be relatively stable. in that case you can pre-compute them too. We do incremental indexing every half an hour on this collection. Average of 50K-100K documents during each indexing. Collection has 7+ milliion documents. So the entire corpus does not get updated in every indexing. 2> your problem statement has nothing to do with termfreq so why are you using it in the first place? I read up on termfreq function again. It returns the number of times the term appears in the field for that document. It does not really fit the requirements. Thank you for pointing it out. I should use map instead? Derek On 8/12/2020 9:48 pm, Erick Erickson wrote: Before worrying about it too much, exactly _how_ much has the performance changed? I’ve just been in too many situations where there’s no objective measure of performance before and after, just someone saying “it seems slower” and had those performance changes disappear when a rigorous test is done. Then spent a lot of time figuring out that the person reporting the problem hadn’t had coffee yet. Or the network was slow. Or…. If it does turn out to be the boosting (and IIRC the map function can be expensive), can you pre-compute some number of the boosts? Your requirements look like they can be computed at index time, then boost by just the value of the pre-computed field. BTW, boosts < 1.0 _reduce_ the score. I mention that just in case that’s a surprise ;) Of course that means that to change the boosting you need to re-index. You use termfreq, which changes of course, but 1> if your corpus is updated often enough, the termfreqs will be relatively stable. in that case you can pre-compute them too. 2> your problem statement has nothing to do with termfreq so why are you using it in the first place? Best, Erick On Dec 8, 2020, at 12:46 AM, Radu Gheorghe wrote: Hi Derek, Ah, then my reply was completely off :) I don’t really see a better way. Maybe other than changing termfreq to field, if the numeric field has docValues? That may be faster, but I don’t know for sure. Best regards, Radu -- Sematext Cloud - Full Stack Observability - https://sematext.com Solr and Elasticsearch Consulting, Training and Production Support On 8 Dec 2020, at 06:17, Derek Poh wrote: Hi Radu Apologies for not making myself clear. I would like to know if there is a more simple or efficient way to craft the boosting parameters based on the requirements. For example, I am using 'if', 'map' and 'termfreq' functions in the bf parameters. Is there a more efficient or simple function that can be use instead? Or craft the 'formula' it in a more efficient way? On 7/12/2020 10:05 pm, Radu Gheorghe wrote: Hi Derek, It’s hard to tell whether your boosts can be made better without knowing your data and what users expect of it. Which is a problem in itself. I would suggest gathering judgements, like if a user queries for X, what doc IDs do you expect to get back? Once you have enough of these judgements, you can experiment with boosts and see how the query results change. There are measures such as nDCG ( https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG ) that can help you measure that per query, and you can average this score across all your judgements to get an overall measure of how well you’re doing. Or even better, you can have something like Quaerite play with boost values for you: https://github.com/tballison/quaerite/blob/main/quaerite-examples/README.md#genetic-algorithms-ga-runga Best regards, Radu -- Sematext Cloud - Full Stack Observability - https://sematext.com Solr and Elasticsearch Consulting, Training and Production Support On 7 Dec 2020, at 10:51, Derek Poh wrote:
Re: optimize boosting parameters
Before worrying about it too much, exactly _how_ much has the performance changed? I’ve just been in too many situations where there’s no objective measure of performance before and after, just someone saying “it seems slower” and had those performance changes disappear when a rigorous test is done. Then spent a lot of time figuring out that the person reporting the problem hadn’t had coffee yet. Or the network was slow. Or…. If it does turn out to be the boosting (and IIRC the map function can be expensive), can you pre-compute some number of the boosts? Your requirements look like they can be computed at index time, then boost by just the value of the pre-computed field. BTW, boosts < 1.0 _reduce_ the score. I mention that just in case that’s a surprise ;) Of course that means that to change the boosting you need to re-index. You use termfreq, which changes of course, but 1> if your corpus is updated often enough, the termfreqs will be relatively stable. in that case you can pre-compute them too. 2> your problem statement has nothing to do with termfreq so why are you using it in the first place? Best, Erick > On Dec 8, 2020, at 12:46 AM, Radu Gheorghe wrote: > > Hi Derek, > > Ah, then my reply was completely off :) > > I don’t really see a better way. Maybe other than changing termfreq to field, > if the numeric field has docValues? That may be faster, but I don’t know for > sure. > > Best regards, > Radu > -- > Sematext Cloud - Full Stack Observability - https://sematext.com > Solr and Elasticsearch Consulting, Training and Production Support > >> On 8 Dec 2020, at 06:17, Derek Poh wrote: >> >> Hi Radu >> >> Apologies for not making myself clear. >> >> I would like to know if there is a more simple or efficient way to craft the >> boosting parameters based on the requirements. >> >> For example, I am using 'if', 'map' and 'termfreq' functions in the bf >> parameters. >> >> Is there a more efficient or simple function that can be use instead? Or >> craft the 'formula' it in a more efficient way? >> >> On 7/12/2020 10:05 pm, Radu Gheorghe wrote: >>> Hi Derek, >>> >>> It’s hard to tell whether your boosts can be made better without knowing >>> your data and what users expect of it. Which is a problem in itself. >>> >>> I would suggest gathering judgements, like if a user queries for X, what >>> doc IDs do you expect to get back? >>> >>> Once you have enough of these judgements, you can experiment with boosts >>> and see how the query results change. There are measures such as nDCG ( >>> https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG >>> ) that can help you measure that per query, and you can average this score >>> across all your judgements to get an overall measure of how well you’re >>> doing. >>> >>> Or even better, you can have something like Quaerite play with boost values >>> for you: >>> >>> https://github.com/tballison/quaerite/blob/main/quaerite-examples/README.md#genetic-algorithms-ga-runga >>> >>> >>> Best regards, >>> Radu >>> -- >>> Sematext Cloud - Full Stack Observability - >>> https://sematext.com >>> >>> Solr and Elasticsearch Consulting, Training and Production Support >>> >>> On 7 Dec 2020, at 10:51, Derek Poh wrote: Hi I have added the following boosting requirements to the search query of a page. Feedback from monitoring team is that the overall response of the page has increased since then. I am trying to find out if the added boosting parameters (below) could have contributed to the increased. The boosting is working as per requirements. May I know if the implemented boosting parameters can be enhanced or optimized further? Hopefully to improve on the response time of the query and the page. Requirements: 1. If P_SupplierResponseRate is: a. 3, boost by 0.4 b. 2, boost by 0.2 2. If P_SupplierResponseTime is: a. 4, boost by 0.4 b. 3, boost by 0.2 3. If P_MWSScore is: a. between 80-100, boost by 1.6 b. between 60-79, boost by 0.8 4. If P_SupplierRanking is: a. 3, boost by 0.3 b. 4, boost by 0.6 c. 5, boost by 0.9 b. 6, boost by 1.2 Boosting parameters implemented: bf=map(P_SupplierResponseRate,3,3,0.4,0) bf=map(P_SupplierResponseRate,2,2,0.2,0) bf=map(P_SupplierResponseTime,4,4,0.4,0) bf=map(P_SupplierResponseTime,3,3,0.2,0) bf=map(P_MWSScore,80,100,1.6,0) bf=map(P_MWSScore,60,79,0.8,0) bf=if(termfreq(P_SupplierRanking,3),0.3,if(termfreq(P_SupplierRanking,4),0.6,if(termfreq(P_SupplierRanking,5),0.9,if(termfreq(P_SupplierRanking,6),1.2,0 I am using Solr 7.7.2 -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged inform
Re: optimize boosting parameters
Hi Derek, Ah, then my reply was completely off :) I don’t really see a better way. Maybe other than changing termfreq to field, if the numeric field has docValues? That may be faster, but I don’t know for sure. Best regards, Radu -- Sematext Cloud - Full Stack Observability - https://sematext.com Solr and Elasticsearch Consulting, Training and Production Support > On 8 Dec 2020, at 06:17, Derek Poh wrote: > > Hi Radu > > Apologies for not making myself clear. > > I would like to know if there is a more simple or efficient way to craft the > boosting parameters based on the requirements. > > For example, I am using 'if', 'map' and 'termfreq' functions in the bf > parameters. > > Is there a more efficient or simple function that can be use instead? Or > craft the 'formula' it in a more efficient way? > > On 7/12/2020 10:05 pm, Radu Gheorghe wrote: >> Hi Derek, >> >> It’s hard to tell whether your boosts can be made better without knowing >> your data and what users expect of it. Which is a problem in itself. >> >> I would suggest gathering judgements, like if a user queries for X, what doc >> IDs do you expect to get back? >> >> Once you have enough of these judgements, you can experiment with boosts and >> see how the query results change. There are measures such as nDCG ( >> https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG >> ) that can help you measure that per query, and you can average this score >> across all your judgements to get an overall measure of how well you’re >> doing. >> >> Or even better, you can have something like Quaerite play with boost values >> for you: >> >> https://github.com/tballison/quaerite/blob/main/quaerite-examples/README.md#genetic-algorithms-ga-runga >> >> >> Best regards, >> Radu >> -- >> Sematext Cloud - Full Stack Observability - >> https://sematext.com >> >> Solr and Elasticsearch Consulting, Training and Production Support >> >> >>> On 7 Dec 2020, at 10:51, Derek Poh >>> wrote: >>> >>> Hi >>> >>> I have added the following boosting requirements to the search query of a >>> page. Feedback from monitoring team is that the overall response of the >>> page has increased since then. >>> I am trying to find out if the added boosting parameters (below) could have >>> contributed to the increased. >>> >>> The boosting is working as per requirements. >>> >>> May I know if the implemented boosting parameters can be enhanced or >>> optimized further? >>> Hopefully to improve on the response time of the query and the page. >>> >>> Requirements: >>> 1. If P_SupplierResponseRate is: >>>a. 3, boost by 0.4 >>>b. 2, boost by 0.2 >>> >>> 2. If P_SupplierResponseTime is: >>>a. 4, boost by 0.4 >>>b. 3, boost by 0.2 >>> >>> 3. If P_MWSScore is: >>>a. between 80-100, boost by 1.6 >>>b. between 60-79, boost by 0.8 >>> >>> 4. If P_SupplierRanking is: >>>a. 3, boost by 0.3 >>>b. 4, boost by 0.6 >>>c. 5, boost by 0.9 >>>b. 6, boost by 1.2 >>> >>> Boosting parameters implemented: >>> bf=map(P_SupplierResponseRate,3,3,0.4,0) >>> bf=map(P_SupplierResponseRate,2,2,0.2,0) >>> >>> bf=map(P_SupplierResponseTime,4,4,0.4,0) >>> bf=map(P_SupplierResponseTime,3,3,0.2,0) >>> >>> bf=map(P_MWSScore,80,100,1.6,0) >>> bf=map(P_MWSScore,60,79,0.8,0) >>> >>> bf=if(termfreq(P_SupplierRanking,3),0.3,if(termfreq(P_SupplierRanking,4),0.6,if(termfreq(P_SupplierRanking,5),0.9,if(termfreq(P_SupplierRanking,6),1.2,0 >>> >>> >>> I am using Solr 7.7.2 >>> >>> -- >>> CONFIDENTIALITY NOTICE >>> This e-mail (including any attachments) may contain confidential and/or >>> privileged information. If you are not the intended recipient or have >>> received this e-mail in error, please inform the sender immediately and >>> delete this e-mail (including any attachments) from your computer, and you >>> must not use, disclose to anyone else or copy this e-mail (including any >>> attachments), whether in whole or in part. >>> This e-mail and any reply to it may be monitored for security, legal, >>> regulatory compliance and/or other appropriate reasons. >>> >>> >> > > > > > > -- > CONFIDENTIALITY NOTICE > > This e-mail (including any attachments) may contain confidential and/or > privileged information. If you are not the intended recipient or have > received this e-mail in error, please inform the sender immediately and > delete this e-mail (including any attachments) from your computer, and you > must not use, disclose to anyone else or copy this e-mail (including any > attachments), whether in whole or in part. > > This e-mail and any reply to it may be monitored for security, legal, > regulatory compliance and/or other appropriate reasons. > >
Re: optimize boosting parameters
Hi Radu Apologies for not making myself clear. I would like to know if there is a more simple or efficient way to craft the boosting parameters based on the requirements. For example, I am using 'if', 'map' and 'termfreq' functions in the bf parameters. Is there a more efficient or simple function that can be use instead? Or craft the 'formula' it in a more efficient way? On 7/12/2020 10:05 pm, Radu Gheorghe wrote: Hi Derek, It’s hard to tell whether your boosts can be made better without knowing your data and what users expect of it. Which is a problem in itself. I would suggest gathering judgements, like if a user queries for X, what doc IDs do you expect to get back? Once you have enough of these judgements, you can experiment with boosts and see how the query results change. There are measures such as nDCG (https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG) that can help you measure that per query, and you can average this score across all your judgements to get an overall measure of how well you’re doing. Or even better, you can have something like Quaerite play with boost values for you: https://github.com/tballison/quaerite/blob/main/quaerite-examples/README.md#genetic-algorithms-ga-runga Best regards, Radu -- Sematext Cloud - Full Stack Observability - https://sematext.com Solr and Elasticsearch Consulting, Training and Production Support On 7 Dec 2020, at 10:51, Derek Poh wrote: Hi I have added the following boosting requirements to the search query of a page. Feedback from monitoring team is that the overall response of the page has increased since then. I am trying to find out if the added boosting parameters (below) could have contributed to the increased. The boosting is working as per requirements. May I know if the implemented boosting parameters can be enhanced or optimized further? Hopefully to improve on the response time of the query and the page. Requirements: 1. If P_SupplierResponseRate is: a. 3, boost by 0.4 b. 2, boost by 0.2 2. If P_SupplierResponseTime is: a. 4, boost by 0.4 b. 3, boost by 0.2 3. If P_MWSScore is: a. between 80-100, boost by 1.6 b. between 60-79, boost by 0.8 4. If P_SupplierRanking is: a. 3, boost by 0.3 b. 4, boost by 0.6 c. 5, boost by 0.9 b. 6, boost by 1.2 Boosting parameters implemented: bf=map(P_SupplierResponseRate,3,3,0.4,0) bf=map(P_SupplierResponseRate,2,2,0.2,0) bf=map(P_SupplierResponseTime,4,4,0.4,0) bf=map(P_SupplierResponseTime,3,3,0.2,0) bf=map(P_MWSScore,80,100,1.6,0) bf=map(P_MWSScore,60,79,0.8,0) bf=if(termfreq(P_SupplierRanking,3),0.3,if(termfreq(P_SupplierRanking,4),0.6,if(termfreq(P_SupplierRanking,5),0.9,if(termfreq(P_SupplierRanking,6),1.2,0 I am using Solr 7.7.2 -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: optimize boosting parameters
Hi Derek, It’s hard to tell whether your boosts can be made better without knowing your data and what users expect of it. Which is a problem in itself. I would suggest gathering judgements, like if a user queries for X, what doc IDs do you expect to get back? Once you have enough of these judgements, you can experiment with boosts and see how the query results change. There are measures such as nDCG (https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG) that can help you measure that per query, and you can average this score across all your judgements to get an overall measure of how well you’re doing. Or even better, you can have something like Quaerite play with boost values for you: https://github.com/tballison/quaerite/blob/main/quaerite-examples/README.md#genetic-algorithms-ga-runga Best regards, Radu -- Sematext Cloud - Full Stack Observability - https://sematext.com Solr and Elasticsearch Consulting, Training and Production Support > On 7 Dec 2020, at 10:51, Derek Poh wrote: > > Hi > > I have added the following boosting requirements to the search query of a > page. Feedback from monitoring team is that the overall response of the page > has increased since then. > I am trying to find out if the added boosting parameters (below) could have > contributed to the increased. > > The boosting is working as per requirements. > > May I know if the implemented boosting parameters can be enhanced or > optimized further? > Hopefully to improve on the response time of the query and the page. > > Requirements: > 1. If P_SupplierResponseRate is: >a. 3, boost by 0.4 >b. 2, boost by 0.2 > > 2. If P_SupplierResponseTime is: >a. 4, boost by 0.4 >b. 3, boost by 0.2 > > 3. If P_MWSScore is: >a. between 80-100, boost by 1.6 >b. between 60-79, boost by 0.8 > > 4. If P_SupplierRanking is: >a. 3, boost by 0.3 >b. 4, boost by 0.6 >c. 5, boost by 0.9 >b. 6, boost by 1.2 > > Boosting parameters implemented: > bf=map(P_SupplierResponseRate,3,3,0.4,0) > bf=map(P_SupplierResponseRate,2,2,0.2,0) > > bf=map(P_SupplierResponseTime,4,4,0.4,0) > bf=map(P_SupplierResponseTime,3,3,0.2,0) > > bf=map(P_MWSScore,80,100,1.6,0) > bf=map(P_MWSScore,60,79,0.8,0) > > bf=if(termfreq(P_SupplierRanking,3),0.3,if(termfreq(P_SupplierRanking,4),0.6,if(termfreq(P_SupplierRanking,5),0.9,if(termfreq(P_SupplierRanking,6),1.2,0 > > > I am using Solr 7.7.2 > > -- > CONFIDENTIALITY NOTICE > This e-mail (including any attachments) may contain confidential and/or > privileged information. If you are not the intended recipient or have > received this e-mail in error, please inform the sender immediately and > delete this e-mail (including any attachments) from your computer, and you > must not use, disclose to anyone else or copy this e-mail (including any > attachments), whether in whole or in part. > This e-mail and any reply to it may be monitored for security, legal, > regulatory compliance and/or other appropriate reasons. >