Re: Faster loading to solr...

2010-10-01 Thread Lance Norskog
Please start a new email thread for this instead of replying to an 
existing one with a new subject and question.



Sharma, Raghvendra wrote:

I have been able to load around a million rows/docs in around 5+ minutes.  The 
schema contains around 250+ fields.  For the moment, I have kept everything as 
string.
I am sure there are ways to get better loading speeds than this.

Will the data type matter in loading speeds ?? or anything else ?

Can someone help me with any tips ? perhaps any best practices  kind of 
document/article..
Anything ..

--raghav..

**
This message may contain confidential or proprietary information intended only 
for the use of the
addressee(s) named above or may contain information that is legally privileged. 
If you are
not the intended addressee, or the person responsible for delivering it to the 
intended addressee,
you are hereby notified that reading, disseminating, distributing or copying 
this message is strictly
prohibited. If you have received this message by mistake, please immediately 
notify us by
replying to the message and delete the original message and any copies 
immediately thereafter.

Thank you.
**
CLLD
   


Re: Faster loading to solr...

2010-09-30 Thread Gora Mohanty
On Thu, Sep 30, 2010 at 10:49 PM, Sharma, Raghvendra
 wrote:
> I have been able to load around a million rows/docs in around 5+ minutes.  
> The schema contains around 250+ fields.  For the moment, I have kept 
> everything as string.
> I am sure there are ways to get better loading speeds than this.

A million documents with 250 fields in 5 minutes sounds fast to
me. As a comparison, we do a million documents with about 60 fields
in an hour, using multiple Solr cores. However, this is very likely an
apples to oranges comparison, as we are pulling large amounts of
data from a database over a network. What indexing times are you
aiming for?

If you can shard your data, using multiple cores on a single Solr
instance, and/or multiple Solr instances will speed up your indexing.
However, if you want a complete, non-sharded index, you will need
to merge the sharded ones.

> Will the data type matter in loading speeds ?? or anything else ?

Data type might matter if there is a lot of processing involved for
that data type. E.g., the text type has several analyzers and tokenizers.

> Can someone help me with any tips ? perhaps any best practices  kind of 
> document/article..
> Anything ..
[...]

The Solr Wiki has many suggestions, e.g., look at the documentation
on the DataImportHandler. In our experience, XML import has been
very fast. A generic document is difficult as the speed is dependent
on many things, such as the data source, number and type of fields,
size of data, etc. Your best bet is to try out several approaches.

Regards,
Gora


Faster loading to solr...

2010-09-30 Thread Sharma, Raghvendra
I have been able to load around a million rows/docs in around 5+ minutes.  The 
schema contains around 250+ fields.  For the moment, I have kept everything as 
string. 
I am sure there are ways to get better loading speeds than this.

Will the data type matter in loading speeds ?? or anything else ?

Can someone help me with any tips ? perhaps any best practices  kind of 
document/article..
Anything ..

--raghav..

**
 
This message may contain confidential or proprietary information intended only 
for the use of the 
addressee(s) named above or may contain information that is legally privileged. 
If you are 
not the intended addressee, or the person responsible for delivering it to the 
intended addressee, 
you are hereby notified that reading, disseminating, distributing or copying 
this message is strictly 
prohibited. If you have received this message by mistake, please immediately 
notify us by  
replying to the message and delete the original message and any copies 
immediately thereafter. 

Thank you. 
**
 
CLLD