On Thu, Mar 6, 2008 at 12:22 PM,  <[EMAIL PROTECTED]> wrote:
> > Since Lucene buffers in memory, you will always have the risk of
>  > losing recently added documents that haven't been flushed yet.
>  > Committing on every document would be too slow to be practical.
>
>  Well it is not sooo sloooow...
>
>  I have indexed 10.000 docs, resulting in 14 MB index. The index has 2 stored
>  fields and the tokenized content field.
>
>  With a commit after every add: 30 min.
>  With a commit after 100 add: 23 min.
>  Only one commit: 20 min.

All of these times look pretty slow... perhaps lucene is not the
bottleneck here?

-Yonik

>  (including time to get the document from the archive)
>
>  I use lucene 2.3 so a commit is a combination of closing and creating the
>  writer.
>  2.4/3.0 has a commit method which may be faster.
>
>  Before this test I thought it would be much slower than 30 min...
>
>  So one has to decide if correctness is more important than performance.
>
>  I use a batch size of 100, first committing lucene, then committing the
>  database which holds the status of the document if it is already indexed or
>  not.
>  If the db commit fails it is no problem, because my app does not care about
>  multiple indexed documents. But until now neither the lucene nor the db
>  commit ever failed...
>
>
>
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: [EMAIL PROTECTED]
>  For additional commands, e-mail: [EMAIL PROTECTED]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to