Re: DuplicatesFilter - one for contrib?

Grant Ingersoll Tue, 02 Oct 2007 08:26:25 -0700

+1

-Grant
On Sep 30, 2007, at 4:47 PM, markharw00d wrote:

I've put together a new Filter and Junit test for eliminatingduplicates from search results.
The typical usage scenario is where multiple documents exist in theindex which share an untokenized field value (e.g. the sameprimary key or URL). It is desirable to keep copies in the indexbecause some searches wish to see the multiple versions (e.g. toview a revision history for a document). However, when a search isdone which needs to return only one version of each document (oftenthe latest version) this filter can be used as an efficient meansof filtering results. The bitset produced marks ALL the "master"docs in an index for a field and this filter can be safely cachedfor reuse with any query
       DuplicateFilter df=new DuplicateFilter(KEY_FIELD_NAME);
       df.setKeepMode(DuplicateFilter.KM_USE_LAST_OCCURRENCE);
       Hits h = searcher.search(query,df);


If anyone else finds this useful I'll commit it.

Cheers
Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: DuplicatesFilter - one for contrib?

Reply via email to