Re: Faster loading to solr...
Please start a new email thread for this instead of replying to an existing one with a new subject and question. Sharma, Raghvendra wrote: I have been able to load around a million rows/docs in around 5+ minutes. The schema contains around 250+ fields. For the moment, I have kept everything as string. I am sure there are ways to get better loading speeds than this. Will the data type matter in loading speeds ?? or anything else ? Can someone help me with any tips ? perhaps any best practices kind of document/article.. Anything .. --raghav.. ** This message may contain confidential or proprietary information intended only for the use of the addressee(s) named above or may contain information that is legally privileged. If you are not the intended addressee, or the person responsible for delivering it to the intended addressee, you are hereby notified that reading, disseminating, distributing or copying this message is strictly prohibited. If you have received this message by mistake, please immediately notify us by replying to the message and delete the original message and any copies immediately thereafter. Thank you. ** CLLD
Re: Faster loading to solr...
On Thu, Sep 30, 2010 at 10:49 PM, Sharma, Raghvendra wrote: > I have been able to load around a million rows/docs in around 5+ minutes. > The schema contains around 250+ fields. For the moment, I have kept > everything as string. > I am sure there are ways to get better loading speeds than this. A million documents with 250 fields in 5 minutes sounds fast to me. As a comparison, we do a million documents with about 60 fields in an hour, using multiple Solr cores. However, this is very likely an apples to oranges comparison, as we are pulling large amounts of data from a database over a network. What indexing times are you aiming for? If you can shard your data, using multiple cores on a single Solr instance, and/or multiple Solr instances will speed up your indexing. However, if you want a complete, non-sharded index, you will need to merge the sharded ones. > Will the data type matter in loading speeds ?? or anything else ? Data type might matter if there is a lot of processing involved for that data type. E.g., the text type has several analyzers and tokenizers. > Can someone help me with any tips ? perhaps any best practices kind of > document/article.. > Anything .. [...] The Solr Wiki has many suggestions, e.g., look at the documentation on the DataImportHandler. In our experience, XML import has been very fast. A generic document is difficult as the speed is dependent on many things, such as the data source, number and type of fields, size of data, etc. Your best bet is to try out several approaches. Regards, Gora
Faster loading to solr...
I have been able to load around a million rows/docs in around 5+ minutes. The schema contains around 250+ fields. For the moment, I have kept everything as string. I am sure there are ways to get better loading speeds than this. Will the data type matter in loading speeds ?? or anything else ? Can someone help me with any tips ? perhaps any best practices kind of document/article.. Anything .. --raghav.. ** This message may contain confidential or proprietary information intended only for the use of the addressee(s) named above or may contain information that is legally privileged. If you are not the intended addressee, or the person responsible for delivering it to the intended addressee, you are hereby notified that reading, disseminating, distributing or copying this message is strictly prohibited. If you have received this message by mistake, please immediately notify us by replying to the message and delete the original message and any copies immediately thereafter. Thank you. ** CLLD