hi Daniel, How do you use a separate database to check the duplicate fields? It is interesting! Best Regards. jacky ----- Original Message ----- From: "Daniel Noll" <[EMAIL PROTECTED]> To: <java-user@lucene.apache.org> Sent: Friday, September 08, 2006 3:08 PM Subject: Re: duplicate fields
> jacky wrote: > > hi, 1. Is there an effect method to check if there exists the same > > field(hold a unique ID) when added into lucene index database? Make a > > search for this field? > > One way is to create an IndexReader and IndexSearcher on your index, > which you reopen every now and then. But we do this task by using a > separate database, for the sake of efficiency. > > > 2. Is there an effect method to check if there exists the duplicate > > fields(hold a unique ID) in the lucene index database? Two methods: > > Read all documents and compare the fields, or search for each field. > > Is there a better one? > > The simplest way without using an external database is to use the > termDocs enumeration. For each term you can easily see which ones have > multiple documents, so every document other than the first for each term > is a duplicate (which you could then use to build a filter to remove > duplicates.) > > Daniel > > > > -- > Daniel Noll > > Nuix Pty Ltd > Suite 79, 89 Jones St, Ultimo NSW 2007, Australia Ph: +61 2 9280 0699 > Web: http://www.nuix.com.au/ Fax: +61 2 9212 6902 > > This message is intended only for the named recipient. If you are not > the intended recipient you are notified that disclosing, copying, > distributing or taking any action in reliance on the contents of this > message or attachment is strictly prohibited. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] >