Yes, opening/closing will be very costly. But I *believe*, although I haven't tried it, that IndexModifier (2.1) will work for you.
But do NOT take my word for it as I haven't tried to do what you're doing. But it should be easy to write a short test or two to prove that you can find recently-inserted documents.... Best Erick On 6/28/07, Jens Grivolla <[EMAIL PROTECTED]> wrote:
Hi, I have a Lucene index with a few million entries, and I will need to add batches of a few hundred thousand or a few million additional entries. Unfortunately, I absolutely need to have all indexed entries available when inserting a new one, even within one batch, in order to do some duplicate detection (using Lucene). I believe this means having to close the writer and reopen the reader to reflect the changes after each add. I'm thinking of having the original index remain static and add the new entries to a separate index and merge the two later on. I can even query them separately during the batch insertion if that gives me better performance. The questions: How costly is merging two big indexes? When / how often do I need to call optimize() on the new index? Should I just keep MergeFactor at the default value? If I have autoCommit=true, I can keep the writer open, but still need to flush() and reopen readers to reflect the changes, right? Is it better to have a MultiReader on both indexes or query them separately so I don't have to reopen the old one every time? Additional info: Documents are very short, just a few words each. I have 4 gigs of RAM in the machine, of which I could allocate quite a bit as heap for the writer if that helps. Thanks for any hints on how to best go about this, Jens P.S.: all my mails to the list get silently dropped when sending through GMail, possibly because the sender is not the same as the from: header (and only the from: actually contains the subscribed address). This is very annoying and makes it impossible for me to write to the list in many situations. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]