Re: Indexing and Duplication

Ype Kingma Sat, 16 Mar 2002 13:59:32 -0800

Kelvin,

>I've got a little problem with indexing that I'd like to throw to everyone.
>
>My objects have a unique identifier. When indexing, before I create a new
>document, I'd like to check if a document has already been created with this
>identifier. If so, I'd like to retrieve the document corresponding to this
>identifier, and add the fields I currently have to this document's fields
>and write it. If no such document exists, then I'd create a new document,
>add my fields and write it. What this really does, I guess, is ensure that a
>document object represents a body of information which really belongs
>together, eliminating duplication.
>
>With the current API, writing and retrieving is performed by the IndexWriter
>and IndexReader respectively. This effectively means that in order to do the
>above, I'd have to close the writer, create a new instance of the index
>reader after each document has been added in order for the reader to have
>the most updated version of the index (!).
>
>Does anyone have any suggestions how I might approach this?


Avoid closing and opening too much by batching n docs at a time
on the index reader and then to the things needed for the n docs on the
index writer. You might have to delete docs on the reader, too.

The reasons for using the reader for reading/searching/deleting
and the using writer for adding have been discussed some time ago on this
list. I can't provide a pointer into the list archives as I don't recall
the original subject header, sorry.

Regards,
Ype

-- 

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Re: Indexing and Duplication

Reply via email to