> Since Lucene buffers in memory, you will always have the risk of
> losing recently added documents that haven't been flushed yet.
> Committing on every document would be too slow to be practical.

Well it is not sooo sloooow...

I have indexed 10.000 docs, resulting in 14 MB index. The index has 2 stored
fields and the tokenized content field.

With a commit after every add: 30 min.
With a commit after 100 add: 23 min.
Only one commit: 20 min.

(including time to get the document from the archive)

I use lucene 2.3 so a commit is a combination of closing and creating the
writer.
2.4/3.0 has a commit method which may be faster.

Before this test I thought it would be much slower than 30 min...

So one has to decide if correctness is more important than performance.

I use a batch size of 100, first committing lucene, then committing the
database which holds the status of the document if it is already indexed or
not.
If the db commit fails it is no problem, because my app does not care about
multiple indexed documents. But until now neither the lucene nor the db
commit ever failed...



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to