On Fri, 2008-03-07 at 15:38 +0100, [EMAIL PROTECTED] wrote:
> > With a commit after every add: (286 sec / 10,000 docs) 28.6 ms.
> > With a commit after every 100 add: (12 sec / 10,000 docs) 1.2 ms.
> > Only one commit: (8 sec / 10,000 docs) 0.8 ms.
> 
> Of couse. If you need so less time to create a document than a commit which
> may take, lets say 10 - 500 ms, will slow down indexing heavily.

I tried doing the same with 1 million documents and contrary to my
expectations, the amortized time for commits didn't increase for the
frequent commits. I thought that the merging of the segments were more
expensive. But then again - the resulting index is "only" 1.3GB.

With a commit after every add: (21818 sec / 1,000,000 docs) 21.818 ms.
With a commit after every 100 add: (1879 sec / 1,000,000 docs) 1.879 ms.
Only one commit: (1270 sec / 1,000,000 docs) 1.270 ms.


While I were at it, I tried running the test on SSDs and fast harddisks
on our dual-core test-server.

2 Sandisk solid state drives in RAID 0:
With a commit after every add: (1460 sec / 1,000,000 docs) 1.460 ms.
With a commit after every 100 add: (409 sec / 1,000,000 docs) 0.400 ms.
Only one commit: (321 sec / 1,000,000 docs) 0.317 ms.

2 MTRON solid state drives in RAID 0:
With a commit after every add: (1495 sec / 1,000,000 docs) 1.495 ms.
With a commit after every 100 add: (353 sec / 1,000,000 docs) 0.353 ms.
Only one commit: (296 sec / 1,000,000 docs) 0.296 ms.

2 10.000 RPM harddisks in RAID 0:
With a commit after every add: (1781 sec / 1,000,000 docs) 1.781 ms.
With a commit after every 100 add: (380 sec / 1,000,000 docs) 0.373 ms.
Only one commit: (335 sec / 1,000,000 docs) 0.323 ms.

2 15.000 RPM harddisks in RAID 1:
With a commit after every add: (1876 sec / 1,000,000 docs) 1.876 ms.
With a commit after every 100 add: (458 sec / 1,000,000 docs) 0.458 ms.
Only one commit: (361 sec / 1,000,000 docs)  ms.

Not enough data for real statistics, I know, but on the faster server
system, the penalty for frequent commits doesn't appear to be so bad.

> So it really depends on the use case and how long it takes to index a single
> document, inclusive retrieval of the document from ist source.

Absolutely.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to