On Tue, 2006-05-23 at 17:38 -0400, Benjamin Stein wrote: > I have a requirement to only return one result for all documents whose > timestamps fall within N seconds of one another. (where timestamp is a > field and N is an integer). > > For example, Document A is timestamped "12:00:00" and Document B has > timestamp "12:00:30", Document B should be discarded. On the other > hand, if Document B has timestamp "12:01:00" then I should return both > (assuming 30 < N < 59 seconds). > > Similarly, if Documents A, B, and C have timestamps "12:00:00", > "12:00:30", and "12:01:00" respectively, only Document A should be > returned (because B is close to A, and C is close to B). > > If it helps to simplify things, we can assume results are sorted by > time. Also, I can apply logic at index time or at search time. > > Any suggestions? This is a pretty tough concept to search the > archives for...
How big is the corpus and how many hits do you estimate a search can result in? Can you just take the penalty from iterating the hits? --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]