Ype,

That would be a good solution to my problem if only I weren't performing
multi-threaded indexing. :(
The Reader obtained by any one thread may not be an accurate reflection of
the actual state of the index, just what the state when the Reader was
instantiated.

My current solution is that I hold a collection of documents with the key as
my object identifier and only write them to the writer after indexing is
done. I chose it because it saved me having to write, then delete a
document, etc. However, it's not so ideal because the memory consumed by
such an approach may be prohibitive.

What do you think?

Regards,
Kelvin
----- Original Message -----
From: "Ype Kingma" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Sunday, March 17, 2002 6:15 AM
Subject: Re: Indexing and Duplication


> Kelvin,
>
> >I've got a little problem with indexing that I'd like to throw to
everyone.
> >
> >My objects have a unique identifier. When indexing, before I create a new
> >document, I'd like to check if a document has already been created with
this
> >identifier. If so, I'd like to retrieve the document corresponding to
this
> >identifier, and add the fields I currently have to this document's fields
> >and write it. If no such document exists, then I'd create a new document,
> >add my fields and write it. What this really does, I guess, is ensure
that a
> >document object represents a body of information which really belongs
> >together, eliminating duplication.
> >
> >With the current API, writing and retrieving is performed by the
IndexWriter
> >and IndexReader respectively. This effectively means that in order to do
the
> >above, I'd have to close the writer, create a new instance of the index
> >reader after each document has been added in order for the reader to have
> >the most updated version of the index (!).
> >
> >Does anyone have any suggestions how I might approach this?
>
> Avoid closing and opening too much by batching n docs at a time
> on the index reader and then to the things needed for the n docs on the
> index writer. You might have to delete docs on the reader, too.
>
> The reasons for using the reader for reading/searching/deleting
> and the using writer for adding have been discussed some time ago on this
> list. I can't provide a pointer into the list archives as I don't recall
> the original subject header, sorry.
>
> Regards,
> Ype
>
> --
>
> --
> To unsubscribe, e-mail:
<mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>
>


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to