Kelvin, >I've got a little problem with indexing that I'd like to throw to everyone. > >My objects have a unique identifier. When indexing, before I create a new >document, I'd like to check if a document has already been created with this >identifier. If so, I'd like to retrieve the document corresponding to this >identifier, and add the fields I currently have to this document's fields >and write it. If no such document exists, then I'd create a new document, >add my fields and write it. What this really does, I guess, is ensure that a >document object represents a body of information which really belongs >together, eliminating duplication. > >With the current API, writing and retrieving is performed by the IndexWriter >and IndexReader respectively. This effectively means that in order to do the >above, I'd have to close the writer, create a new instance of the index >reader after each document has been added in order for the reader to have >the most updated version of the index (!). > >Does anyone have any suggestions how I might approach this?
Avoid closing and opening too much by batching n docs at a time on the index reader and then to the things needed for the n docs on the index writer. You might have to delete docs on the reader, too. The reasons for using the reader for reading/searching/deleting and the using writer for adding have been discussed some time ago on this list. I can't provide a pointer into the list archives as I don't recall the original subject header, sorry. Regards, Ype -- -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>