Chris Hostetter wrote:
: What this issue doesn't discuss is what to do with partial results obtained
: when a timeout occurred. As the original poster points out, document lists are
: traversed in the order they were added and not the order of their importance,
: which introduces a bias to partial results in that they reflect results from a
: random sample (which is likely not the most relevant, i.e. there could have
: been more relevant results later in the traversal order).
: : The answer to this issue is org.apache.nutch.indexer.IndexSorter, which

skimming this it doesn't seem like a refactored version that was less nutch specific cold make a handy contrib ... but it also seems like there may be a simpler approach for the (i assume) common case of prefering docs that were indexed later....

if we eliminate the requirement for *strict* preference of recent documents and make that a more loose desire, then we coulnd't we do a pretty good job if we just changed Segment merging to reorder reverse the order of the segments before each merge? it wouldn't be very useful to start doing this on an index that's already a decent size, but if this was happening on every merge right from the very begining, then the most recent documents would percollate to the front of the index right?

The only downside i can think of would be that docids would frequently (not not very predictably) change even if there were no deletions .. but you'd pay that same penalty with something like the nutch's IndexSorter.

I'm not much of an expert on segment merging.. but with the exception of docid order i can'tthink of many reasons why there couldn't be a merger that revesed the order of hte segments.

I think this would be too messy - currently we can be sure of the simple rule that documents added to the index get incrementally higher docids, i.e. the higher the docid the more recent is the document. I think it would be much simpler to write a FilterIndexReader that simply reverses the order of docids.

The issue with Nutch's IndexSorter is that it allows you to reorder docids in an arbitrary manner, using a user-supplied mapping between old and new docids, which can be based on values retrieved from the current index or from any other source. So I think this would be the main value of this class applicable to various scenarios.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to