Re: Merging Solr Indexes
Thanks Otis. Could you write to same core (same index) from multiple threads at the same time? I thought each writer would lock the index so other can not write at the same time. I'll try it though. Another reason of putting indexes in separate core was to limit the index size. Our index can grow up to 50G a day, so I was hoping writing to smaller indexes would be faster in separate cores and if needed I can merge them at later point (like end of day). I want to keep daily cores. Isn't this a good idea? How else can I limit the index size (beside multiple instances or separate boxes). Thanks, -vivek On Tue, Mar 31, 2009 at 8:28 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Let me start with 4) Have you tried simply using multiple threads to send your docs to a single Solr instance/core? You should get about the same performance as what you are trying with your approach below, but without the headache of managing multiple cores and index merging (not yet possible to do programatically). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar vivex...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, March 31, 2009 1:59:01 PM Subject: Merging Solr Indexes Hi, As part of speeding up the index process I'm thinking of spawning multiple threads which will write to different temporary SolrCores. Once the index process is done I want to merge all the indexes in temporary cores to a master core. For ex., if I want one SolrCore per day then every index cycle I'll spawn 4 threads which will index into some temporary index and once they are done I want to merge all these into the day core. My questions, 1) I want to use the same schema and solrconfig.xml for all cores without duplicating them - how do I do that? 2) How do I merge the temporary Solr cores into one master core programmatically? I've read the wiki on MergingSolrIndexes, but I want to do it programmatically (like in Lucene - writer.addIndexes(..)) once the temporary indices are done. 3) Can I remove the temporary indices once the merge process is done? 4) Is this the right strategy to speed up indexing? Thanks, -vivek
Re: Merging Solr Indexes
Hi, Yes, you can write to the same index from multiple threads. You still need to keep track of the index size manually, whether you create 1 or N indices/cores. I'd go with a single index first. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar vivex...@gmail.com To: solr-user@lucene.apache.org Sent: Wednesday, April 1, 2009 4:26:04 AM Subject: Re: Merging Solr Indexes Thanks Otis. Could you write to same core (same index) from multiple threads at the same time? I thought each writer would lock the index so other can not write at the same time. I'll try it though. Another reason of putting indexes in separate core was to limit the index size. Our index can grow up to 50G a day, so I was hoping writing to smaller indexes would be faster in separate cores and if needed I can merge them at later point (like end of day). I want to keep daily cores. Isn't this a good idea? How else can I limit the index size (beside multiple instances or separate boxes). Thanks, -vivek On Tue, Mar 31, 2009 at 8:28 PM, Otis Gospodnetic wrote: Let me start with 4) Have you tried simply using multiple threads to send your docs to a single Solr instance/core? You should get about the same performance as what you are trying with your approach below, but without the headache of managing multiple cores and index merging (not yet possible to do programatically). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar To: solr-user@lucene.apache.org Sent: Tuesday, March 31, 2009 1:59:01 PM Subject: Merging Solr Indexes Hi, As part of speeding up the index process I'm thinking of spawning multiple threads which will write to different temporary SolrCores. Once the index process is done I want to merge all the indexes in temporary cores to a master core. For ex., if I want one SolrCore per day then every index cycle I'll spawn 4 threads which will index into some temporary index and once they are done I want to merge all these into the day core. My questions, 1) I want to use the same schema and solrconfig.xml for all cores without duplicating them - how do I do that? 2) How do I merge the temporary Solr cores into one master core programmatically? I've read the wiki on MergingSolrIndexes, but I want to do it programmatically (like in Lucene - writer.addIndexes(..)) once the temporary indices are done. 3) Can I remove the temporary indices once the merge process is done? 4) Is this the right strategy to speed up indexing? Thanks, -vivek
Re: Merging Solr Indexes
There is a jira issue on supporting index merge: https://issues.apache.org/jira/browse/SOLR-1051. But I agree with Otis that you should go with a single index first. Cheers, Ning On Wed, Apr 1, 2009 at 12:06 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, Yes, you can write to the same index from multiple threads. You still need to keep track of the index size manually, whether you create 1 or N indices/cores. I'd go with a single index first. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar vivex...@gmail.com To: solr-user@lucene.apache.org Sent: Wednesday, April 1, 2009 4:26:04 AM Subject: Re: Merging Solr Indexes Thanks Otis. Could you write to same core (same index) from multiple threads at the same time? I thought each writer would lock the index so other can not write at the same time. I'll try it though. Another reason of putting indexes in separate core was to limit the index size. Our index can grow up to 50G a day, so I was hoping writing to smaller indexes would be faster in separate cores and if needed I can merge them at later point (like end of day). I want to keep daily cores. Isn't this a good idea? How else can I limit the index size (beside multiple instances or separate boxes). Thanks, -vivek On Tue, Mar 31, 2009 at 8:28 PM, Otis Gospodnetic wrote: Let me start with 4) Have you tried simply using multiple threads to send your docs to a single Solr instance/core? You should get about the same performance as what you are trying with your approach below, but without the headache of managing multiple cores and index merging (not yet possible to do programatically). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar To: solr-user@lucene.apache.org Sent: Tuesday, March 31, 2009 1:59:01 PM Subject: Merging Solr Indexes Hi, As part of speeding up the index process I'm thinking of spawning multiple threads which will write to different temporary SolrCores. Once the index process is done I want to merge all the indexes in temporary cores to a master core. For ex., if I want one SolrCore per day then every index cycle I'll spawn 4 threads which will index into some temporary index and once they are done I want to merge all these into the day core. My questions, 1) I want to use the same schema and solrconfig.xml for all cores without duplicating them - how do I do that? 2) How do I merge the temporary Solr cores into one master core programmatically? I've read the wiki on MergingSolrIndexes, but I want to do it programmatically (like in Lucene - writer.addIndexes(..)) once the temporary indices are done. 3) Can I remove the temporary indices once the merge process is done? 4) Is this the right strategy to speed up indexing? Thanks, -vivek
Re: Merging Solr Indexes
Let me start with 4) Have you tried simply using multiple threads to send your docs to a single Solr instance/core? You should get about the same performance as what you are trying with your approach below, but without the headache of managing multiple cores and index merging (not yet possible to do programatically). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar vivex...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, March 31, 2009 1:59:01 PM Subject: Merging Solr Indexes Hi, As part of speeding up the index process I'm thinking of spawning multiple threads which will write to different temporary SolrCores. Once the index process is done I want to merge all the indexes in temporary cores to a master core. For ex., if I want one SolrCore per day then every index cycle I'll spawn 4 threads which will index into some temporary index and once they are done I want to merge all these into the day core. My questions, 1) I want to use the same schema and solrconfig.xml for all cores without duplicating them - how do I do that? 2) How do I merge the temporary Solr cores into one master core programmatically? I've read the wiki on MergingSolrIndexes, but I want to do it programmatically (like in Lucene - writer.addIndexes(..)) once the temporary indices are done. 3) Can I remove the temporary indices once the merge process is done? 4) Is this the right strategy to speed up indexing? Thanks, -vivek