Re: Sort Stability With Date Boosting and Rounding
That would improve things for recent documents, but documents that were close to each other, but a long time from NOW, would still have very small differences that would be susceptible to rounding errors that can cause results to get shuffled. Stephen Duncan Jr www.stephenduncanjr.com On Tue, Feb 22, 2011 at 6:07 PM, David Yang dy...@nextjump.com wrote: One suggestion: use logarithms to compress the large time range into something easier to compare: 1/log(ms(now,date) -Original Message- From: Stephen Duncan Jr [mailto:stephen.dun...@gmail.com] Sent: Tuesday, February 22, 2011 6:03 PM To: solr-user@lucene.apache.org Subject: Sort Stability With Date Boosting and Rounding I'm trying to use http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents as a bf parameter to my dismax handler. The problem is, the value of NOW can cause documents in a similar range (date value within a few seconds of each other) to sometimes round to be equal, and sometimes not, changing their sort order (when equal, falling back to a secondary sort). This, in turn, screws up paging. The problem is that score is rounded to a lower level of precision than what the suggested formula produces as a difference between two values within seconds of each other. It seems to me if I could round the value to minutes or hours, where the difference will be large enough to not be rounded-out, then I wouldn't have problems with order changing on me. But it's not legal syntax to specify something like: recip(ms(NOW,manufacturedate_dt/HOUR),3.16e-11,1,1) Is this a problem anyone has faced and solved? Anyone have suggested solutions, other than indexing a copy of the date field that's rounded to the hour? -- Stephen Duncan Jr www.stephenduncanjr.com
Re: Sort Stability With Date Boosting and Rounding
Hi, This seems to be a tricky issue judging from the other replies. I'm just thinking out of the box now and the following options come to mind: 1) can you store the timestamp in the session in your middleware for each user? This way it stays fixed and doesn't change the order between requests. Of course, the order can still change when new documents are committed but this cannot be avoided. 2) if you have frequent commits, you might find a way to modify Solr's RandomSortField to create a NOW for each commit. The timestamp remains fixed for all consequetive requests if you use the same field for the timestamp everytime. So instead of generating a random value, you'd just compute the current timestamp and the behavior will stay the same as RandomSortField. Cheers The problem comes when you have results that are all the same natural score (because you've filtered them, with no primary search, for instance), and are very close together in time. Then, as you page through, the order changes. So the user experience is that they see duplicate documents, and miss out on some of the docs in the overall set. It's not something negligible that I can ignore. I either have to come up with a fix for this, or get rid of the boost function altogether. Stephen Duncan Jr www.stephenduncanjr.com On Tue, Feb 22, 2011 at 6:09 PM, Markus Jelsma markus.jel...@openindex.iowrote: Hi, You're right, it's illegal syntax to use other functions in the ms function, which is a pity indeed. However, you reduce the score by 50% for each year. Therefore paging through the results shouldn't make that much of a difference because the difference in score with NOW+2 minutes has a negligable impact on the total score. I had some thoughts on this issue as well but i decided the impact was too little to bother about. Cheers, I'm trying to use http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of _n ewer_documents as a bf parameter to my dismax handler. The problem is, the value of NOW can cause documents in a similar range (date value within a few seconds of each other) to sometimes round to be equal, and sometimes not, changing their sort order (when equal, falling back to a secondary sort). This, in turn, screws up paging. The problem is that score is rounded to a lower level of precision than what the suggested formula produces as a difference between two values within seconds of each other. It seems to me if I could round the value to minutes or hours, where the difference will be large enough to not be rounded-out, then I wouldn't have problems with order changing on me. But it's not legal syntax to specify something like: recip(ms(NOW,manufacturedate_dt/HOUR),3.16e-11,1,1) Is this a problem anyone has faced and solved? Anyone have suggested solutions, other than indexing a copy of the date field that's rounded to the hour? -- Stephen Duncan Jr www.stephenduncanjr.com
Sort Stability With Date Boosting and Rounding
I'm trying to use http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents as a bf parameter to my dismax handler. The problem is, the value of NOW can cause documents in a similar range (date value within a few seconds of each other) to sometimes round to be equal, and sometimes not, changing their sort order (when equal, falling back to a secondary sort). This, in turn, screws up paging. The problem is that score is rounded to a lower level of precision than what the suggested formula produces as a difference between two values within seconds of each other. It seems to me if I could round the value to minutes or hours, where the difference will be large enough to not be rounded-out, then I wouldn't have problems with order changing on me. But it's not legal syntax to specify something like: recip(ms(NOW,manufacturedate_dt/HOUR),3.16e-11,1,1) Is this a problem anyone has faced and solved? Anyone have suggested solutions, other than indexing a copy of the date field that's rounded to the hour? -- Stephen Duncan Jr www.stephenduncanjr.com
RE: Sort Stability With Date Boosting and Rounding
One suggestion: use logarithms to compress the large time range into something easier to compare: 1/log(ms(now,date) -Original Message- From: Stephen Duncan Jr [mailto:stephen.dun...@gmail.com] Sent: Tuesday, February 22, 2011 6:03 PM To: solr-user@lucene.apache.org Subject: Sort Stability With Date Boosting and Rounding I'm trying to use http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents as a bf parameter to my dismax handler. The problem is, the value of NOW can cause documents in a similar range (date value within a few seconds of each other) to sometimes round to be equal, and sometimes not, changing their sort order (when equal, falling back to a secondary sort). This, in turn, screws up paging. The problem is that score is rounded to a lower level of precision than what the suggested formula produces as a difference between two values within seconds of each other. It seems to me if I could round the value to minutes or hours, where the difference will be large enough to not be rounded-out, then I wouldn't have problems with order changing on me. But it's not legal syntax to specify something like: recip(ms(NOW,manufacturedate_dt/HOUR),3.16e-11,1,1) Is this a problem anyone has faced and solved? Anyone have suggested solutions, other than indexing a copy of the date field that's rounded to the hour? -- Stephen Duncan Jr www.stephenduncanjr.com
Re: Sort Stability With Date Boosting and Rounding
You could always use a secondary sort as a tie-breaker, i.e: something unique like 'documentid' or something. That would ensure a stable sort. 2011/2/23 Stephen Duncan Jr stephen.dun...@gmail.com I'm trying to use http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents as a bf parameter to my dismax handler. The problem is, the value of NOW can cause documents in a similar range (date value within a few seconds of each other) to sometimes round to be equal, and sometimes not, changing their sort order (when equal, falling back to a secondary sort). This, in turn, screws up paging. The problem is that score is rounded to a lower level of precision than what the suggested formula produces as a difference between two values within seconds of each other. It seems to me if I could round the value to minutes or hours, where the difference will be large enough to not be rounded-out, then I wouldn't have problems with order changing on me. But it's not legal syntax to specify something like: recip(ms(NOW,manufacturedate_dt/HOUR),3.16e-11,1,1) Is this a problem anyone has faced and solved? Anyone have suggested solutions, other than indexing a copy of the date field that's rounded to the hour? -- Stephen Duncan Jr www.stephenduncanjr.com
Re: Sort Stability With Date Boosting and Rounding
Hi, You're right, it's illegal syntax to use other functions in the ms function, which is a pity indeed. However, you reduce the score by 50% for each year. Therefore paging through the results shouldn't make that much of a difference because the difference in score with NOW+2 minutes has a negligable impact on the total score. I had some thoughts on this issue as well but i decided the impact was too little to bother about. Cheers, I'm trying to use http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_n ewer_documents as a bf parameter to my dismax handler. The problem is, the value of NOW can cause documents in a similar range (date value within a few seconds of each other) to sometimes round to be equal, and sometimes not, changing their sort order (when equal, falling back to a secondary sort). This, in turn, screws up paging. The problem is that score is rounded to a lower level of precision than what the suggested formula produces as a difference between two values within seconds of each other. It seems to me if I could round the value to minutes or hours, where the difference will be large enough to not be rounded-out, then I wouldn't have problems with order changing on me. But it's not legal syntax to specify something like: recip(ms(NOW,manufacturedate_dt/HOUR),3.16e-11,1,1) Is this a problem anyone has faced and solved? Anyone have suggested solutions, other than indexing a copy of the date field that's rounded to the hour? -- Stephen Duncan Jr www.stephenduncanjr.com
Re: Sort Stability With Date Boosting and Rounding
The problem comes when you have results that are all the same natural score (because you've filtered them, with no primary search, for instance), and are very close together in time. Then, as you page through, the order changes. So the user experience is that they see duplicate documents, and miss out on some of the docs in the overall set. It's not something negligible that I can ignore. I either have to come up with a fix for this, or get rid of the boost function altogether. Stephen Duncan Jr www.stephenduncanjr.com On Tue, Feb 22, 2011 at 6:09 PM, Markus Jelsma markus.jel...@openindex.iowrote: Hi, You're right, it's illegal syntax to use other functions in the ms function, which is a pity indeed. However, you reduce the score by 50% for each year. Therefore paging through the results shouldn't make that much of a difference because the difference in score with NOW+2 minutes has a negligable impact on the total score. I had some thoughts on this issue as well but i decided the impact was too little to bother about. Cheers, I'm trying to use http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_n ewer_documents as a bf parameter to my dismax handler. The problem is, the value of NOW can cause documents in a similar range (date value within a few seconds of each other) to sometimes round to be equal, and sometimes not, changing their sort order (when equal, falling back to a secondary sort). This, in turn, screws up paging. The problem is that score is rounded to a lower level of precision than what the suggested formula produces as a difference between two values within seconds of each other. It seems to me if I could round the value to minutes or hours, where the difference will be large enough to not be rounded-out, then I wouldn't have problems with order changing on me. But it's not legal syntax to specify something like: recip(ms(NOW,manufacturedate_dt/HOUR),3.16e-11,1,1) Is this a problem anyone has faced and solved? Anyone have suggested solutions, other than indexing a copy of the date field that's rounded to the hour? -- Stephen Duncan Jr www.stephenduncanjr.com
Re: Sort Stability With Date Boosting and Rounding
No, the problem is that, due to rounding, sometimes the docs ARE considered ties, and therefore the secondary sort is used, but sometimes they don't round to exactly equal, and the tiebreaker isn't used, and the results get shuffled. Stephen Duncan Jr www.stephenduncanjr.com On Tue, Feb 22, 2011 at 6:09 PM, Geert-Jan Brits gbr...@gmail.com wrote: You could always use a secondary sort as a tie-breaker, i.e: something unique like 'documentid' or something. That would ensure a stable sort. 2011/2/23 Stephen Duncan Jr stephen.dun...@gmail.com I'm trying to use http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents as a bf parameter to my dismax handler. The problem is, the value of NOW can cause documents in a similar range (date value within a few seconds of each other) to sometimes round to be equal, and sometimes not, changing their sort order (when equal, falling back to a secondary sort). This, in turn, screws up paging. The problem is that score is rounded to a lower level of precision than what the suggested formula produces as a difference between two values within seconds of each other. It seems to me if I could round the value to minutes or hours, where the difference will be large enough to not be rounded-out, then I wouldn't have problems with order changing on me. But it's not legal syntax to specify something like: recip(ms(NOW,manufacturedate_dt/HOUR),3.16e-11,1,1) Is this a problem anyone has faced and solved? Anyone have suggested solutions, other than indexing a copy of the date field that's rounded to the hour? -- Stephen Duncan Jr www.stephenduncanjr.com