AW: solr performance

2007-02-20 Thread Burkamp, Christian
I do agree. There's probably no need to go to the index directly.
My current solr test server has more than 5M documents and a size of about 60GB.
I still index at 13 docs per second and this still includes filtering of the 
documents.
(If you have your content ready in XML format performance will be even better).
It seems to me that indexing performance does not drop as the index increases.
Optimizing the index although does take huge amounts of time for large indexes.

--Christian

-Ursprüngliche Nachricht-
Von: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Gesendet: Dienstag, 20. Februar 2007 11:43
An: solr-user@lucene.apache.org
Betreff: Re: solr performance


You could build your index using Lucene directly and then point a  
Solr instance at it once its built.  My suspicion is that the  
overhead of forming a document as an XML string and posting to Solr  
via HTTP won't be that much different than indexing with Lucene  
directly.

My largest Solr index is currently at 1.4M and it takes a max of 3ms  
to add a document (according to Solr's console), most of them 1ms.   
My single threaded indexer is indexing around 1000 documents per  
minute, but I think I can get this number even faster by  
parallelizing the indexer.

I'm curious what rates others are indexing at ???

Erik



On Feb 20, 2007, at 2:21 AM, Jack L wrote:

 Hello,

 I have a question about solr's performance of accepting inserts and 
 indexing. If I have 10 million documents that I'd like to index, I 
 suppose it will take some time to submit them to solr. Is there any 
 faster way to do this than through the web interface?

 --
 Best regards,
 Jack

 __
 Do You Yahoo!?
 Tired of spam?  Yahoo! Mail has the best spam protection around 
 http://mail.yahoo.com



Re: AW: solr performance

2007-02-20 Thread Walter Underwood
Indexing rates depend heavily on document size (text) and pre-indexing
processing. Other things probably matter, too, like number of fields.

My application is indexing 20X faster than Christian's, because I have
small documents (a few hundred bytes) that are extracted from an RDBMS
and submitted in Solr's XML format.

I am probably seeing something close to the maximum rate at 250 docs/s.
This is on a dual-CPU 3 GHz Xeon, Fedora Core 4, JDK 1.5. A fast RAID
would probably make it go faster, but that is about the only speedup
I can think of.

This has been discussed before, so check the mailing list archives.

wunder

On 2/20/07 2:58 AM, Burkamp, Christian [EMAIL PROTECTED] wrote:

 I do agree. There's probably no need to go to the index directly.
 My current solr test server has more than 5M documents and a size of about
 60GB.
 I still index at 13 docs per second and this still includes filtering of the
 documents.
 (If you have your content ready in XML format performance will be even
 better).
 It seems to me that indexing performance does not drop as the index increases.
 Optimizing the index although does take huge amounts of time for large
 indexes.
 
 --Christian
 
 -Ursprüngliche Nachricht-
 Von: Erik Hatcher [mailto:[EMAIL PROTECTED]
 Gesendet: Dienstag, 20. Februar 2007 11:43
 An: solr-user@lucene.apache.org
 Betreff: Re: solr performance
 
 
 You could build your index using Lucene directly and then point a
 Solr instance at it once its built.  My suspicion is that the
 overhead of forming a document as an XML string and posting to Solr
 via HTTP won't be that much different than indexing with Lucene
 directly.
 
 My largest Solr index is currently at 1.4M and it takes a max of 3ms
 to add a document (according to Solr's console), most of them 1ms.
 My single threaded indexer is indexing around 1000 documents per
 minute, but I think I can get this number even faster by
 parallelizing the indexer.
 
 I'm curious what rates others are indexing at ???
 
 Erik
 
 
 
 On Feb 20, 2007, at 2:21 AM, Jack L wrote:
 
 Hello,
 
 I have a question about solr's performance of accepting inserts and
 indexing. If I have 10 million documents that I'd like to index, I
 suppose it will take some time to submit them to solr. Is there any
 faster way to do this than through the web interface?
 
 --
 Best regards,
 Jack
 
 __
 Do You Yahoo!?
 Tired of spam?  Yahoo! Mail has the best spam protection around
 http://mail.yahoo.com