Re: Sort Stability With Date Boosting and Rounding

2011-02-23 Thread Stephen Duncan Jr
That would improve things for recent documents, but documents that were
close to each other, but a long time from NOW, would still have very small
differences that would be susceptible to rounding errors that can cause
results to get shuffled.

Stephen Duncan Jr
www.stephenduncanjr.com


On Tue, Feb 22, 2011 at 6:07 PM, David Yang dy...@nextjump.com wrote:

 One suggestion: use logarithms to compress the large time range into
 something easier to compare: 1/log(ms(now,date)

 -Original Message-
 From: Stephen Duncan Jr [mailto:stephen.dun...@gmail.com]
 Sent: Tuesday, February 22, 2011 6:03 PM
 To: solr-user@lucene.apache.org
 Subject: Sort Stability With Date Boosting and Rounding

 I'm trying to use

 http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
 as
 a bf parameter to my dismax handler.  The problem is, the value of NOW can
 cause documents in a similar range (date value within a few seconds of each
 other) to sometimes round to be equal, and sometimes not, changing their
 sort order (when equal, falling back to a secondary sort).  This, in turn,
 screws up paging.

 The problem is that score is rounded to a lower level of precision than
 what
 the suggested formula produces as a difference between two values within
 seconds of each other.  It seems to me if I could round the value to
 minutes
 or hours, where the difference will be large enough to not be rounded-out,
 then I wouldn't have problems with order changing on me.  But it's not
 legal
 syntax to specify something like:
 recip(ms(NOW,manufacturedate_dt/HOUR),3.16e-11,1,1)

 Is this a problem anyone has faced and solved?  Anyone have suggested
 solutions, other than indexing a copy of the date field that's rounded to
 the hour?

 --
 Stephen Duncan Jr
 www.stephenduncanjr.com



Re: Sort Stability With Date Boosting and Rounding

2011-02-23 Thread Markus Jelsma
Hi,

This seems to be a tricky issue judging from the other replies. I'm just 
thinking out of the box now and the following options come to mind:

1) can you store the timestamp in the session in your middleware for each 
user? This way it stays fixed and doesn't change the order between requests. Of 
course, the order can still change when new documents are committed but this 
cannot be avoided. 

2) if you have frequent commits, you might find a way to modify Solr's 
RandomSortField to create a NOW for each commit. The timestamp remains fixed 
for all consequetive requests if you use the same field for the timestamp 
everytime. So instead of generating a random value, you'd just compute the 
current timestamp and the behavior will stay the same as RandomSortField.

Cheers

 The problem comes when you have results that are all the same natural score
 (because you've filtered them, with no primary search, for instance), and
 are very close together in time.  Then, as you page through, the order
 changes.  So the user experience is that they see duplicate documents, and
 miss out on some of the docs in the overall set.  It's not something
 negligible that I can ignore.  I either have to come up with a fix for
 this, or get rid of the boost function altogether.
 
 Stephen Duncan Jr
 www.stephenduncanjr.com
 
 
 On Tue, Feb 22, 2011 at 6:09 PM, Markus Jelsma
 
 markus.jel...@openindex.iowrote:
  Hi,
  
  You're right, it's illegal syntax to use other functions in the ms
  function,
  which is a pity indeed.
  
  However, you reduce the score by 50% for each year. Therefore paging
  through
  the results shouldn't make that much of a difference because the
  difference in
  score with NOW+2 minutes has a negligable impact on the total score.
  
  I had some thoughts on this issue as well but i decided the impact was
  too little to bother about.
  
  Cheers,
  
   I'm trying to use
  
  http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of
  _n
  
   ewer_documents as
   a bf parameter to my dismax handler.  The problem is, the value of NOW
  
  can
  
   cause documents in a similar range (date value within a few seconds of
  
  each
  
   other) to sometimes round to be equal, and sometimes not, changing
   their sort order (when equal, falling back to a secondary sort). 
   This, in
  
  turn,
  
   screws up paging.
   
   The problem is that score is rounded to a lower level of precision than
   what the suggested formula produces as a difference between two values
   within seconds of each other.  It seems to me if I could round the
   value to minutes or hours, where the difference will be large enough
   to not be rounded-out, then I wouldn't have problems with order
   changing on me.
   
   But
   
   it's not legal syntax to specify something like:
   recip(ms(NOW,manufacturedate_dt/HOUR),3.16e-11,1,1)
   
   Is this a problem anyone has faced and solved?  Anyone have suggested
   solutions, other than indexing a copy of the date field that's rounded
   to the hour?
   
   --
   Stephen Duncan Jr
   www.stephenduncanjr.com


Sort Stability With Date Boosting and Rounding

2011-02-22 Thread Stephen Duncan Jr
I'm trying to use
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
as
a bf parameter to my dismax handler.  The problem is, the value of NOW can
cause documents in a similar range (date value within a few seconds of each
other) to sometimes round to be equal, and sometimes not, changing their
sort order (when equal, falling back to a secondary sort).  This, in turn,
screws up paging.

The problem is that score is rounded to a lower level of precision than what
the suggested formula produces as a difference between two values within
seconds of each other.  It seems to me if I could round the value to minutes
or hours, where the difference will be large enough to not be rounded-out,
then I wouldn't have problems with order changing on me.  But it's not legal
syntax to specify something like:
recip(ms(NOW,manufacturedate_dt/HOUR),3.16e-11,1,1)

Is this a problem anyone has faced and solved?  Anyone have suggested
solutions, other than indexing a copy of the date field that's rounded to
the hour?

--
Stephen Duncan Jr
www.stephenduncanjr.com


RE: Sort Stability With Date Boosting and Rounding

2011-02-22 Thread David Yang
One suggestion: use logarithms to compress the large time range into something 
easier to compare: 1/log(ms(now,date)

-Original Message-
From: Stephen Duncan Jr [mailto:stephen.dun...@gmail.com] 
Sent: Tuesday, February 22, 2011 6:03 PM
To: solr-user@lucene.apache.org
Subject: Sort Stability With Date Boosting and Rounding

I'm trying to use
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
as
a bf parameter to my dismax handler.  The problem is, the value of NOW can
cause documents in a similar range (date value within a few seconds of each
other) to sometimes round to be equal, and sometimes not, changing their
sort order (when equal, falling back to a secondary sort).  This, in turn,
screws up paging.

The problem is that score is rounded to a lower level of precision than what
the suggested formula produces as a difference between two values within
seconds of each other.  It seems to me if I could round the value to minutes
or hours, where the difference will be large enough to not be rounded-out,
then I wouldn't have problems with order changing on me.  But it's not legal
syntax to specify something like:
recip(ms(NOW,manufacturedate_dt/HOUR),3.16e-11,1,1)

Is this a problem anyone has faced and solved?  Anyone have suggested
solutions, other than indexing a copy of the date field that's rounded to
the hour?

--
Stephen Duncan Jr
www.stephenduncanjr.com


Re: Sort Stability With Date Boosting and Rounding

2011-02-22 Thread Geert-Jan Brits
You could always use a secondary sort as a tie-breaker, i.e: something
unique like 'documentid' or something. That would ensure a stable sort.

2011/2/23 Stephen Duncan Jr stephen.dun...@gmail.com

 I'm trying to use

 http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
 as
 a bf parameter to my dismax handler.  The problem is, the value of NOW can
 cause documents in a similar range (date value within a few seconds of each
 other) to sometimes round to be equal, and sometimes not, changing their
 sort order (when equal, falling back to a secondary sort).  This, in turn,
 screws up paging.

 The problem is that score is rounded to a lower level of precision than
 what
 the suggested formula produces as a difference between two values within
 seconds of each other.  It seems to me if I could round the value to
 minutes
 or hours, where the difference will be large enough to not be rounded-out,
 then I wouldn't have problems with order changing on me.  But it's not
 legal
 syntax to specify something like:
 recip(ms(NOW,manufacturedate_dt/HOUR),3.16e-11,1,1)

 Is this a problem anyone has faced and solved?  Anyone have suggested
 solutions, other than indexing a copy of the date field that's rounded to
 the hour?

 --
 Stephen Duncan Jr
 www.stephenduncanjr.com



Re: Sort Stability With Date Boosting and Rounding

2011-02-22 Thread Markus Jelsma
Hi,

You're right, it's illegal syntax to use other functions in the ms function, 
which is a pity indeed.

However, you reduce the score by 50% for each year. Therefore paging through 
the results shouldn't make that much of a difference because the difference in 
score with NOW+2 minutes has a negligable impact on the total score.

I had some thoughts on this issue as well but i decided the impact was too 
little to bother about.

Cheers,

 I'm trying to use
 http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_n
 ewer_documents as
 a bf parameter to my dismax handler.  The problem is, the value of NOW can
 cause documents in a similar range (date value within a few seconds of each
 other) to sometimes round to be equal, and sometimes not, changing their
 sort order (when equal, falling back to a secondary sort).  This, in turn,
 screws up paging.
 
 The problem is that score is rounded to a lower level of precision than
 what the suggested formula produces as a difference between two values
 within seconds of each other.  It seems to me if I could round the value
 to minutes or hours, where the difference will be large enough to not be
 rounded-out, then I wouldn't have problems with order changing on me.  But
 it's not legal syntax to specify something like:
 recip(ms(NOW,manufacturedate_dt/HOUR),3.16e-11,1,1)
 
 Is this a problem anyone has faced and solved?  Anyone have suggested
 solutions, other than indexing a copy of the date field that's rounded to
 the hour?
 
 --
 Stephen Duncan Jr
 www.stephenduncanjr.com


Re: Sort Stability With Date Boosting and Rounding

2011-02-22 Thread Stephen Duncan Jr
The problem comes when you have results that are all the same natural score
(because you've filtered them, with no primary search, for instance), and
are very close together in time.  Then, as you page through, the order
changes.  So the user experience is that they see duplicate documents, and
miss out on some of the docs in the overall set.  It's not something
negligible that I can ignore.  I either have to come up with a fix for this,
or get rid of the boost function altogether.

Stephen Duncan Jr
www.stephenduncanjr.com


On Tue, Feb 22, 2011 at 6:09 PM, Markus Jelsma
markus.jel...@openindex.iowrote:

 Hi,

 You're right, it's illegal syntax to use other functions in the ms
 function,
 which is a pity indeed.

 However, you reduce the score by 50% for each year. Therefore paging
 through
 the results shouldn't make that much of a difference because the difference
 in
 score with NOW+2 minutes has a negligable impact on the total score.

 I had some thoughts on this issue as well but i decided the impact was too
 little to bother about.

 Cheers,

  I'm trying to use
 
 http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_n
  ewer_documents as
  a bf parameter to my dismax handler.  The problem is, the value of NOW
 can
  cause documents in a similar range (date value within a few seconds of
 each
  other) to sometimes round to be equal, and sometimes not, changing their
  sort order (when equal, falling back to a secondary sort).  This, in
 turn,
  screws up paging.
 
  The problem is that score is rounded to a lower level of precision than
  what the suggested formula produces as a difference between two values
  within seconds of each other.  It seems to me if I could round the value
  to minutes or hours, where the difference will be large enough to not be
  rounded-out, then I wouldn't have problems with order changing on me.
  But
  it's not legal syntax to specify something like:
  recip(ms(NOW,manufacturedate_dt/HOUR),3.16e-11,1,1)
 
  Is this a problem anyone has faced and solved?  Anyone have suggested
  solutions, other than indexing a copy of the date field that's rounded to
  the hour?
 
  --
  Stephen Duncan Jr
  www.stephenduncanjr.com



Re: Sort Stability With Date Boosting and Rounding

2011-02-22 Thread Stephen Duncan Jr
No, the problem is that, due to rounding, sometimes the docs ARE considered
ties, and therefore the secondary sort is used, but sometimes they don't
round to exactly equal, and the tiebreaker isn't used, and the results get
shuffled.

Stephen Duncan Jr
www.stephenduncanjr.com


On Tue, Feb 22, 2011 at 6:09 PM, Geert-Jan Brits gbr...@gmail.com wrote:

 You could always use a secondary sort as a tie-breaker, i.e: something
 unique like 'documentid' or something. That would ensure a stable sort.

 2011/2/23 Stephen Duncan Jr stephen.dun...@gmail.com

  I'm trying to use
 
 
 http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
  as
  a bf parameter to my dismax handler.  The problem is, the value of NOW
 can
  cause documents in a similar range (date value within a few seconds of
 each
  other) to sometimes round to be equal, and sometimes not, changing their
  sort order (when equal, falling back to a secondary sort).  This, in
 turn,
  screws up paging.
 
  The problem is that score is rounded to a lower level of precision than
  what
  the suggested formula produces as a difference between two values within
  seconds of each other.  It seems to me if I could round the value to
  minutes
  or hours, where the difference will be large enough to not be
 rounded-out,
  then I wouldn't have problems with order changing on me.  But it's not
  legal
  syntax to specify something like:
  recip(ms(NOW,manufacturedate_dt/HOUR),3.16e-11,1,1)
 
  Is this a problem anyone has faced and solved?  Anyone have suggested
  solutions, other than indexing a copy of the date field that's rounded to
  the hour?
 
  --
  Stephen Duncan Jr
  www.stephenduncanjr.com