On Thu, Mar 6, 2008 at 12:22 PM, <[EMAIL PROTECTED]> wrote: > > Since Lucene buffers in memory, you will always have the risk of > > losing recently added documents that haven't been flushed yet. > > Committing on every document would be too slow to be practical. > > Well it is not sooo sloooow... > > I have indexed 10.000 docs, resulting in 14 MB index. The index has 2 stored > fields and the tokenized content field. > > With a commit after every add: 30 min. > With a commit after 100 add: 23 min. > Only one commit: 20 min.
All of these times look pretty slow... perhaps lucene is not the bottleneck here? -Yonik > (including time to get the document from the archive) > > I use lucene 2.3 so a commit is a combination of closing and creating the > writer. > 2.4/3.0 has a commit method which may be faster. > > Before this test I thought it would be much slower than 30 min... > > So one has to decide if correctness is more important than performance. > > I use a batch size of 100, first committing lucene, then committing the > database which holds the status of the document if it is already indexed or > not. > If the db commit fails it is no problem, because my app does not care about > multiple indexed documents. But until now neither the lucene nor the db > commit ever failed... > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]