Re: Merging Solr Indexes

2009-04-01 Thread vivek sar
Thanks Otis. Could you write to same core (same index) from multiple
threads at the same time? I thought each writer would lock the index
so other can not write at the same time. I'll try it though.

Another reason of putting indexes in separate core was to limit the
index size. Our index can grow up to 50G a day, so I was hoping
writing to smaller indexes would be faster in separate cores and if
needed I can merge them at later point (like end of day). I want to
keep daily cores. Isn't this a good idea? How else can I limit the
index size (beside multiple instances or separate boxes).

Thanks,
-vivek


On Tue, Mar 31, 2009 at 8:28 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:

 Let me start with 4)
 Have you tried simply using multiple threads to send your docs to a single 
 Solr instance/core?  You should get about the same performance as what you 
 are trying with your approach below, but without the headache of managing 
 multiple cores and index merging (not yet possible to do programatically).

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: vivek sar vivex...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, March 31, 2009 1:59:01 PM
 Subject: Merging Solr Indexes

 Hi,

   As part of speeding up the index process I'm thinking of spawning
 multiple threads which will write to different temporary SolrCores.
 Once the index process is done I want to merge all the indexes in
 temporary cores to a master core. For ex., if I want one SolrCore per
 day then every index cycle I'll spawn 4 threads which will index into
 some temporary index and once they are done I want to merge all these
 into the day core. My questions,

 1) I want to use the same schema and solrconfig.xml for all cores
 without duplicating them - how do I do that?
 2) How do I merge the temporary Solr cores into one master core
 programmatically? I've read the wiki on MergingSolrIndexes, but I
 want to do it programmatically (like in Lucene -
 writer.addIndexes(..)) once the temporary indices are done.
 3) Can I remove the temporary indices once the merge process is done?
 4) Is this the right strategy to speed up indexing?

 Thanks,
 -vivek




Re: Merging Solr Indexes

2009-04-01 Thread Otis Gospodnetic

Hi,

Yes, you can write to the same index from multiple threads.  You still need to 
keep track of the index size manually, whether you create 1 or N indices/cores. 
 I'd go with a single index first.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: vivek sar vivex...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, April 1, 2009 4:26:04 AM
 Subject: Re: Merging Solr Indexes
 
 Thanks Otis. Could you write to same core (same index) from multiple
 threads at the same time? I thought each writer would lock the index
 so other can not write at the same time. I'll try it though.
 
 Another reason of putting indexes in separate core was to limit the
 index size. Our index can grow up to 50G a day, so I was hoping
 writing to smaller indexes would be faster in separate cores and if
 needed I can merge them at later point (like end of day). I want to
 keep daily cores. Isn't this a good idea? How else can I limit the
 index size (beside multiple instances or separate boxes).
 
 Thanks,
 -vivek
 
 
 On Tue, Mar 31, 2009 at 8:28 PM, Otis Gospodnetic
 wrote:
 
  Let me start with 4)
  Have you tried simply using multiple threads to send your docs to a single 
 Solr instance/core?  You should get about the same performance as what you 
 are 
 trying with your approach below, but without the headache of managing 
 multiple 
 cores and index merging (not yet possible to do programatically).
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: vivek sar 
  To: solr-user@lucene.apache.org
  Sent: Tuesday, March 31, 2009 1:59:01 PM
  Subject: Merging Solr Indexes
 
  Hi,
 
As part of speeding up the index process I'm thinking of spawning
  multiple threads which will write to different temporary SolrCores.
  Once the index process is done I want to merge all the indexes in
  temporary cores to a master core. For ex., if I want one SolrCore per
  day then every index cycle I'll spawn 4 threads which will index into
  some temporary index and once they are done I want to merge all these
  into the day core. My questions,
 
  1) I want to use the same schema and solrconfig.xml for all cores
  without duplicating them - how do I do that?
  2) How do I merge the temporary Solr cores into one master core
  programmatically? I've read the wiki on MergingSolrIndexes, but I
  want to do it programmatically (like in Lucene -
  writer.addIndexes(..)) once the temporary indices are done.
  3) Can I remove the temporary indices once the merge process is done?
  4) Is this the right strategy to speed up indexing?
 
  Thanks,
  -vivek
 
 



Re: Merging Solr Indexes

2009-04-01 Thread Ning Li
There is a jira issue on supporting index merge:
https://issues.apache.org/jira/browse/SOLR-1051.
But I agree with Otis that you should go with a single index first.

Cheers,
Ning


On Wed, Apr 1, 2009 at 12:06 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:

 Hi,

 Yes, you can write to the same index from multiple threads.  You still need 
 to keep track of the index size manually, whether you create 1 or N 
 indices/cores.  I'd go with a single index first.

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: vivek sar vivex...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, April 1, 2009 4:26:04 AM
 Subject: Re: Merging Solr Indexes

 Thanks Otis. Could you write to same core (same index) from multiple
 threads at the same time? I thought each writer would lock the index
 so other can not write at the same time. I'll try it though.

 Another reason of putting indexes in separate core was to limit the
 index size. Our index can grow up to 50G a day, so I was hoping
 writing to smaller indexes would be faster in separate cores and if
 needed I can merge them at later point (like end of day). I want to
 keep daily cores. Isn't this a good idea? How else can I limit the
 index size (beside multiple instances or separate boxes).

 Thanks,
 -vivek


 On Tue, Mar 31, 2009 at 8:28 PM, Otis Gospodnetic
 wrote:
 
  Let me start with 4)
  Have you tried simply using multiple threads to send your docs to a single
 Solr instance/core?  You should get about the same performance as what you 
 are
 trying with your approach below, but without the headache of managing 
 multiple
 cores and index merging (not yet possible to do programatically).
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: vivek sar
  To: solr-user@lucene.apache.org
  Sent: Tuesday, March 31, 2009 1:59:01 PM
  Subject: Merging Solr Indexes
 
  Hi,
 
    As part of speeding up the index process I'm thinking of spawning
  multiple threads which will write to different temporary SolrCores.
  Once the index process is done I want to merge all the indexes in
  temporary cores to a master core. For ex., if I want one SolrCore per
  day then every index cycle I'll spawn 4 threads which will index into
  some temporary index and once they are done I want to merge all these
  into the day core. My questions,
 
  1) I want to use the same schema and solrconfig.xml for all cores
  without duplicating them - how do I do that?
  2) How do I merge the temporary Solr cores into one master core
  programmatically? I've read the wiki on MergingSolrIndexes, but I
  want to do it programmatically (like in Lucene -
  writer.addIndexes(..)) once the temporary indices are done.
  3) Can I remove the temporary indices once the merge process is done?
  4) Is this the right strategy to speed up indexing?
 
  Thanks,
  -vivek
 
 




Re: Merging Solr Indexes

2009-03-31 Thread Otis Gospodnetic

Let me start with 4)
Have you tried simply using multiple threads to send your docs to a single Solr 
instance/core?  You should get about the same performance as what you are 
trying with your approach below, but without the headache of managing multiple 
cores and index merging (not yet possible to do programatically).

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: vivek sar vivex...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, March 31, 2009 1:59:01 PM
 Subject: Merging Solr Indexes
 
 Hi,
 
   As part of speeding up the index process I'm thinking of spawning
 multiple threads which will write to different temporary SolrCores.
 Once the index process is done I want to merge all the indexes in
 temporary cores to a master core. For ex., if I want one SolrCore per
 day then every index cycle I'll spawn 4 threads which will index into
 some temporary index and once they are done I want to merge all these
 into the day core. My questions,
 
 1) I want to use the same schema and solrconfig.xml for all cores
 without duplicating them - how do I do that?
 2) How do I merge the temporary Solr cores into one master core
 programmatically? I've read the wiki on MergingSolrIndexes, but I
 want to do it programmatically (like in Lucene -
 writer.addIndexes(..)) once the temporary indices are done.
 3) Can I remove the temporary indices once the merge process is done?
 4) Is this the right strategy to speed up indexing?
 
 Thanks,
 -vivek