Thanks Erick. You might be joking, but one of our clients indeed had all his servers destroyed in a flood. Of course in this rare case, a solution would be to keep the backup on another site.
However I'm still confused about normal scenarios: Let's say that in the middle of the batch I got an exception and wan't to rollback. Can I do this ? I want to make sure that after a batch finishes (and only then), it is written to disk (and not find out after a while during a commit that something went wrong).Do I have to close the writer or Flush is enough? I though about raising mergeFactor and other parameters to high values (or disabling them) so an automatic merge/commit will not happen, and then I can manually decide when to commit the changes - the size of the batches is not constant so I can't determine in advance. I don't mind hurting the index performance a bit by doing this manually, but I can't efford to let the client think that the information is secured in the index and than to find out that it's not. My index size contains a few million docs and it's size can reach about 30G (we're saving a lot of fields and information for each document). Having a backup index is an option I considered but I wanted to avoid the overhead of keeping them synchronized (they might not be on the same server which exposes a lot of new problems like network issues). Thanks. On Thu, Jun 26, 2008 at 5:42 PM, Erick Erickson <[EMAIL PROTECTED]> wrote: > How big is your index? The simpleminded way would be to copy things around > as your batches come in and only switch to the *real* one after the > additions > were verified. > > You could also just maintain two indexes but only update one at a time. In > the > 99.99% case where things went well, it would just be a matter of continuing > on. > Whenever "something bad happened", you could copy the good index over the > bad one and go at it again. > > But to ask that no matter what, the index is OK is asking a lot.... There > are fires and floods and earthquakes to consider <G> > > Best > Erick > > On Thu, Jun 26, 2008 at 10:28 AM, Eran Sevi <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > I'm looking for the correct way to create an index given the following > > restrictions: > > > > 1. The documents are received in batches of variable sizes (not more then > > 100 docs in a batch). > > 2. The batch insertion must be transactional - either the whole batch is > > added to the index (exists physically on the disk), or the whole batch is > > canceled/aborted and the index remains as before. > > 3. The index must remain valid at all times and shouldn't be corrupted > even > > if a power interruption occurs - *most important* > > 4. Index speed is less important than search speed. > > > > How should I use a writer with all these restrictions? Can I do it > without > > having to close the writer after each batch (maybe flush is enough)? > > > > Should I change the IndexWriter parameters such as mergeFactor, > > RAMBufferSize, etc. ? > > I want to make sure that partial batches are not written to the disk (if > > the > > computer crashes in the middle of the batch, I want to be able to work > with > > the index as it was before the crash). > > > > If I'm working with a single writer, is it guaranteed that no matter what > > happens the index can be opened and used (I don't mind loosing docs, just > > that the index won't be ruined). > > > > Thanks and sorry about the long list of questions, > > Eran. > > >