Hi Zhangchi

Thanks for your reply. 

We have about 3 million records (different isbns) in the database and
documents little more than that, and we wouldn't want to do the deduping at
indexing time, because one book ( one isbn ) can be available under 2 or
more categories( like fiction, comics & novels, science etc)

We had actually applied filter on the primary key ie ID, and it wasn't
working, so I was hoping for some sample code. But then we found out that
the field name on which we wanted the duplicate filter to be applied (Id)
was not actually indexed while adding it into the document. ie Field.Index
was set to NO. We changed this, repopulated the documents and the filtering
works now.

Thanks for your time.




zhangchi wrote:
> 
> 
> i think you should check the index first.using the lukeall to see if there  
> is the duplicate books.
> 
> On Thu, 04 Mar 2010 20:43:26 +0800, ani...@ekkitab <ani...@ekkitab.com>  
> wrote:
> 
>>
>> Hi there, Could someone help me with the usage of DuplicateFilters. Here  
>> is
>> my problem
>>
>> I have created a search index on book Id , title ,and author from a  
>> database
>> of books which fall under various categories. Some books fall under more
>> than one category. Now, when i issue a search, I get back 'X' books  
>> matching
>> the search criteria, some of which are repeated, because that books are  
>> in
>> different documents and its the expected behaviour.
>>
>> I use the  TopFieldDocCollector . getTotalHits() to get the total count.  
>> But
>> this includes the repeats as mentioned above. This count is not the  
>> actual
>> count, Hence when I issue a search on title or author i want to get a  
>> unique
>> count / list of books. How do I use DuplicateFilter to acheive this.
>>
>> Please help
>>
>> Regards
>> Anish
> 
> 
> -- 
> Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/how-to-use-DuplicateFilter-to-get-unique-documents-based-on-a-fieldName-tp27780251p27790391.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to