On Sat, Sep 17, 2011 at 07:46:12AM +0200, goran kent wrote: > On Sat, Sep 17, 2011 at 12:56 AM, Marvin Humphrey > <[email protected]> wrote: > > On Fri, Sep 16, 2011 at 03:00:21PM +0200, goran kent wrote: > >> Any support for collapsing duplicate documents based on a field? > > > > I wrote a DedupingSearcher class for KinoSearch a while ago that did exactly > > this, and I'd be happy to contribute it to the ASF. It will take some > > modernizing to get it compatible with Lucy, though. > > Any possibility of squeezing that into your schedule?
Contributing it is no problem. I won't get to the modernization myself, but if someone wants to take it on I'll be happy to collaborate with them. > > The algorithm is to rerun the search if there is not sufficient diversity in > > the search results, adding exclusions to the query each time to suppress the > > unwanted hits. > > ouch, that doesn't sound good for performance. Am I right? Haven't measured. Marvin Humphrey
