: What this issue doesn't discuss is what to do with partial results obtained : when a timeout occurred. As the original poster points out, document lists are : traversed in the order they were added and not the order of their importance, : which introduces a bias to partial results in that they reflect results from a : random sample (which is likely not the most relevant, i.e. there could have : been more relevant results later in the traversal order). : : The answer to this issue is org.apache.nutch.indexer.IndexSorter, which
skimming this it doesn't seem like a refactored version that was less nutch specific cold make a handy contrib ... but it also seems like there may be a simpler approach for the (i assume) common case of prefering docs that were indexed later.... if we eliminate the requirement for *strict* preference of recent documents and make that a more loose desire, then we coulnd't we do a pretty good job if we just changed Segment merging to reorder reverse the order of the segments before each merge? it wouldn't be very useful to start doing this on an index that's already a decent size, but if this was happening on every merge right from the very begining, then the most recent documents would percollate to the front of the index right? The only downside i can think of would be that docids would frequently (not not very predictably) change even if there were no deletions .. but you'd pay that same penalty with something like the nutch's IndexSorter. I'm not much of an expert on segment merging.. but with the exception of docid order i can'tthink of many reasons why there couldn't be a merger that revesed the order of hte segments. -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]