Re: multiple indexes?
This is very helpful. Thanks a lot, Shaun and Dikchant! So in default single-core situation, the index would live in data/index, correct? On Fri, Nov 30, 2012 at 11:02 PM, Shawn Heisey s...@elyograg.org wrote: On 11/30/2012 10:11 PM, Joe Zhang wrote: May I ask: how to set up multiple indexes, and specify which index to send the docs to at indexing time, and later on, how to specify which index to work with? A related question: what is the storage location and structure of solr indexes? When you index or query data, you'll use a base URL specific to the index (core). Everything goes through that base URL, which includes the name of the core: http://server:port/solr/**corename The file called solr.xml tells Solr about multiple cores.Each core has an instanceDir and a dataDir. http://wiki.apache.org/solr/**CoreAdminhttp://wiki.apache.org/solr/CoreAdmin In the dataDir, Solr will create an index dir, which contains the Lucene index. Here are the file formats for recent versions: http://lucene.apache.org/core/**4_0_0/core/org/apache/lucene/** codecs/lucene40/package-**summary.htmlhttp://lucene.apache.org/core/4_0_0/core/org/apache/lucene/codecs/lucene40/package-summary.html http://lucene.apache.org/core/**3_6_1/fileformats.htmlhttp://lucene.apache.org/core/3_6_1/fileformats.html http://lucene.apache.org/core/**old_versioned_docs/versions/3_** 5_0/fileformats.htmlhttp://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html Thanks, Shawn
Re: multiple indexes?
Multiple indexes can be setup using the multi core feature of Solr. Below are the steps: 1. Add the core name and storage location of the core to the $SOLR_HOME/solr.xml file. cores adminPath=/admin/cores defaultCoreName=core-name1 *core name=core-name1 instanceDir=core-dir1 /* *core name=core-name2 instanceDir=core-dir2 /* /cores 2. Create the core-directories specified and following sub-directories in it: - conf: Contains the configs and schema definition - lib: Contains the required libraries - data: Will be created automatically on first run. This would contain the actual index. While indexing the docs, you specify the core name in the url as follows: http://host:port/solr/core-name/update?parameters Similarly you do while querying. Please refer to Solr Wiki, it has the complete details. Hope this helps! - Dikchant On Sat, Dec 1, 2012 at 10:41 AM, Joe Zhang smartag...@gmail.com wrote: May I ask: how to set up multiple indexes, and specify which index to send the docs to at indexing time, and later on, how to specify which index to work with? A related question: what is the storage location and structure of solr indexes? Thanks in advance, guys! Joe.
Re: multiple indexes?
On 11/30/2012 10:11 PM, Joe Zhang wrote: May I ask: how to set up multiple indexes, and specify which index to send the docs to at indexing time, and later on, how to specify which index to work with? A related question: what is the storage location and structure of solr indexes? When you index or query data, you'll use a base URL specific to the index (core). Everything goes through that base URL, which includes the name of the core: http://server:port/solr/corename The file called solr.xml tells Solr about multiple cores.Each core has an instanceDir and a dataDir. http://wiki.apache.org/solr/CoreAdmin In the dataDir, Solr will create an index dir, which contains the Lucene index. Here are the file formats for recent versions: http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/codecs/lucene40/package-summary.html http://lucene.apache.org/core/3_6_1/fileformats.html http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html Thanks, Shawn
Re: Multiple indexes
your data is being used to build an inverted index rather than being stored as a set of records. de-normalising is fine in most cases. what is your use case which requires a normalised set of indices ? 2011/6/18 François Schiettecatte fschietteca...@gmail.com: You would need to run two independent searches and then 'join' the results. It is best not to apply a 'sql' mindset to SOLR when it comes to (de)normalization, whereas you strive for normalization in sql, that is usually counter-productive in SOLR. For example, I am working on a project with 30+ normalized tables, but only 4 cores. Perhaps describing what you are trying to achieve would give us greater insight and thus be able to make more concrete recommendation? Cheers François On Jun 18, 2011, at 2:36 PM, shacky wrote: Il 18 giugno 2011 20:27, François Schiettecatte fschietteca...@gmail.com ha scritto: Sure. So I can have some searches similar to JOIN on MySQL? The problem is that I need at least two tables in which search data..
Re: Multiple indexes
2011/6/15 Edoardo Tosca e.to...@sourcesense.com: Try to use multiple cores: http://wiki.apache.org/solr/CoreAdmin Can I do concurrent searches on multiple cores?
Re: Multiple indexes
Sure. François On Jun 18, 2011, at 2:25 PM, shacky wrote: 2011/6/15 Edoardo Tosca e.to...@sourcesense.com: Try to use multiple cores: http://wiki.apache.org/solr/CoreAdmin Can I do concurrent searches on multiple cores?
Re: Multiple indexes
Il 18 giugno 2011 20:27, François Schiettecatte fschietteca...@gmail.com ha scritto: Sure. So I can have some searches similar to JOIN on MySQL? The problem is that I need at least two tables in which search data..
Re: Multiple indexes
You would need to run two independent searches and then 'join' the results. It is best not to apply a 'sql' mindset to SOLR when it comes to (de)normalization, whereas you strive for normalization in sql, that is usually counter-productive in SOLR. For example, I am working on a project with 30+ normalized tables, but only 4 cores. Perhaps describing what you are trying to achieve would give us greater insight and thus be able to make more concrete recommendation? Cheers François On Jun 18, 2011, at 2:36 PM, shacky wrote: Il 18 giugno 2011 20:27, François Schiettecatte fschietteca...@gmail.com ha scritto: Sure. So I can have some searches similar to JOIN on MySQL? The problem is that I need at least two tables in which search data..
RE: Multiple indexes
I think there are reasons to use seperate indexes for each document type but do combined searches on these indexes (for example if you need separate TFs for each document type). I wonder if in this precise case it wouldn't be pertinent to have a single index with the various document types each having each their own fields set. Isn't TF calculated field by field ?
RE: Multiple indexes
(for example if you need separate TFs for each document type). I wonder if in this precise case it wouldn't be pertinent to have a single index with the various document types each having each their own fields set. Isn't TF calculated field by field ? Oh, you are right :) So i will start testing with one mixed type index and perhaps use IndexReaderFactory afterwards in comparison. Thanks, Kai Gülzau
RE: Multiple indexes
Are there any plans to support a kind of federated search in a future solr version? I think there are reasons to use seperate indexes for each document type but do combined searches on these indexes (for example if you need separate TFs for each document type). I am aware of http://wiki.apache.org/solr/DistributedSearch and a workaround to do federated search with sharding http://stackoverflow.com/questions/2139030/search-multiple-solr-cores-and-return-one-result-set but this seems to be too much network- and maintenance overhead. Perhaps it is worth a try to use an IndexReaderFactory which returns a lucene MultiReader!? Is the IndexReaderFactory still Experimental? https://issues.apache.org/jira/browse/SOLR-1366 Regards, Kai Gülzau -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Wednesday, June 15, 2011 8:43 PM To: solr-user@lucene.apache.org Subject: Re: Multiple indexes Next, however, I predict you're going to ask how you do a 'join' or otherwise query accross both these cores at once though. You can't do that in Solr. On 6/15/2011 1:00 PM, Frank Wesemann wrote: You'll configure multiple cores: http://wiki.apache.org/solr/CoreAdmin Hi. How to have multiple indexes in SOLR, with different fields and different types of data? Thank you very much! Bye.
Re: Multiple indexes
Try to use multiple cores: http://wiki.apache.org/solr/CoreAdmin On Wed, Jun 15, 2011 at 5:55 PM, shacky shack...@gmail.com wrote: Hi. How to have multiple indexes in SOLR, with different fields and different types of data? Thank you very much! Bye. -- Edoardo Tosca Sourcesense - making sense of Open Source: http://www.sourcesense.com
Re: Multiple indexes
You'll configure multiple cores: http://wiki.apache.org/solr/CoreAdmin Hi. How to have multiple indexes in SOLR, with different fields and different types of data? Thank you very much! Bye. -- mit freundlichem Gruß, Frank Wesemann Fotofinder GmbH USt-IdNr. DE812854514 Software EntwicklungWeb: http://www.fotofinder.com/ Potsdamer Str. 96 Tel: +49 30 25 79 28 90 10785 BerlinFax: +49 30 25 79 28 999 Sitz: Berlin Amtsgericht Berlin Charlottenburg (HRB 73099) Geschäftsführer: Ali Paczensky
Re: Multiple indexes
Next, however, I predict you're going to ask how you do a 'join' or otherwise query accross both these cores at once though. You can't do that in Solr. On 6/15/2011 1:00 PM, Frank Wesemann wrote: You'll configure multiple cores: http://wiki.apache.org/solr/CoreAdmin Hi. How to have multiple indexes in SOLR, with different fields and different types of data? Thank you very much! Bye.
Re: Multiple indexes inside a single core
Here's the Jira issue for the distributed search issue. https://issues.apache.org/jira/browse/SOLR-1632 I tried applying this patch but, get the same error that is posted in the discussion section for that issue. I will be glad to help too on this one. On Sat, Oct 23, 2010 at 2:35 PM, Erick Erickson erickerick...@gmail.comwrote: Ah, I should have read more carefully... I remember this being discussed on the dev list, and I thought there might be a Jira attached but I sure can't find it. If you're willing to work on it, you might hop over to the solr dev list and start a discussion, maybe ask for a place to start. I'm sure some of the devs have thought about this... If nobody on the dev list says There's already a JIRA on it, then you should open one. The Jira issues are generally preferred when you start getting into design because the comments are preserved for the next person who tries the idea or makes changes, etc Best Erick On Wed, Oct 20, 2010 at 9:52 PM, Ben Boggess ben.bogg...@gmail.com wrote: Thanks Erick. The problem with multiple cores is that the documents are scored independently in each core. I would like to be able to search across both cores and have the scores 'normalized' in a way that's similar to what Lucene's MultiSearcher would do. As far a I understand, multiple cores would likely result in seriously skewed scores in my case since the documents are not distributed evenly or randomly. I could have one core/index with 20 million docs and another with 200. I've poked around in the code and this feature doesn't seem to exist. I would be happy with finding a decent place to try to add it. I'm not sure if there is a clean place for it. Ben On Oct 20, 2010, at 8:36 PM, Erick Erickson erickerick...@gmail.com wrote: It seems to me that multiple cores are along the lines you need, a single instance of Solr that can search across multiple sub-indexes that do not necessarily share schemas, and are independently maintainable.. This might be a good place to start: http://wiki.apache.org/solr/CoreAdmin HTH Erick On Wed, Oct 20, 2010 at 3:23 PM, ben boggess ben.bogg...@gmail.com wrote: We are trying to convert a Lucene-based search solution to a Solr/Lucene-based solution. The problem we have is that we currently have our data split into many indexes and Solr expects things to be in a single index unless you're sharding. In addition to this, our indexes wouldn't work well using the distributed search functionality in Solr because the documents are not evenly or randomly distributed. We are currently using Lucene's MultiSearcher to search over subsets of these indexes. I know this has been brought up a number of times in previous posts and the typical response is that the best thing to do is to convert everything into a single index. One of the major reasons for having the indexes split up the way we do is because different types of data need to be indexed at different intervals. You may need one index to be updated every 20 minutes and another is only updated every week. If we move to a single index, then we will constantly be warming and replacing searchers for the entire dataset, and will essentially render the searcher caches useless. If we were able to have multiple indexes, they would each have a searcher and updates would be isolated to a subset of the data. The other problem is that we will likely need to shard this large single index and there isn't a clean way to shard randomly and evenly across the of the data. We would, however like to shard a single data type. If we could use multiple indexes, we would likely be also sharding a small sub-set of them. Thanks in advance, Ben
Re: Multiple indexes inside a single core
Ah, I should have read more carefully... I remember this being discussed on the dev list, and I thought there might be a Jira attached but I sure can't find it. If you're willing to work on it, you might hop over to the solr dev list and start a discussion, maybe ask for a place to start. I'm sure some of the devs have thought about this... If nobody on the dev list says There's already a JIRA on it, then you should open one. The Jira issues are generally preferred when you start getting into design because the comments are preserved for the next person who tries the idea or makes changes, etc Best Erick On Wed, Oct 20, 2010 at 9:52 PM, Ben Boggess ben.bogg...@gmail.com wrote: Thanks Erick. The problem with multiple cores is that the documents are scored independently in each core. I would like to be able to search across both cores and have the scores 'normalized' in a way that's similar to what Lucene's MultiSearcher would do. As far a I understand, multiple cores would likely result in seriously skewed scores in my case since the documents are not distributed evenly or randomly. I could have one core/index with 20 million docs and another with 200. I've poked around in the code and this feature doesn't seem to exist. I would be happy with finding a decent place to try to add it. I'm not sure if there is a clean place for it. Ben On Oct 20, 2010, at 8:36 PM, Erick Erickson erickerick...@gmail.com wrote: It seems to me that multiple cores are along the lines you need, a single instance of Solr that can search across multiple sub-indexes that do not necessarily share schemas, and are independently maintainable.. This might be a good place to start: http://wiki.apache.org/solr/CoreAdmin HTH Erick On Wed, Oct 20, 2010 at 3:23 PM, ben boggess ben.bogg...@gmail.com wrote: We are trying to convert a Lucene-based search solution to a Solr/Lucene-based solution. The problem we have is that we currently have our data split into many indexes and Solr expects things to be in a single index unless you're sharding. In addition to this, our indexes wouldn't work well using the distributed search functionality in Solr because the documents are not evenly or randomly distributed. We are currently using Lucene's MultiSearcher to search over subsets of these indexes. I know this has been brought up a number of times in previous posts and the typical response is that the best thing to do is to convert everything into a single index. One of the major reasons for having the indexes split up the way we do is because different types of data need to be indexed at different intervals. You may need one index to be updated every 20 minutes and another is only updated every week. If we move to a single index, then we will constantly be warming and replacing searchers for the entire dataset, and will essentially render the searcher caches useless. If we were able to have multiple indexes, they would each have a searcher and updates would be isolated to a subset of the data. The other problem is that we will likely need to shard this large single index and there isn't a clean way to shard randomly and evenly across the of the data. We would, however like to shard a single data type. If we could use multiple indexes, we would likely be also sharding a small sub-set of them. Thanks in advance, Ben
Re: Multiple indexes inside a single core
It seems to me that multiple cores are along the lines you need, a single instance of Solr that can search across multiple sub-indexes that do not necessarily share schemas, and are independently maintainable.. This might be a good place to start: http://wiki.apache.org/solr/CoreAdmin HTH Erick On Wed, Oct 20, 2010 at 3:23 PM, ben boggess ben.bogg...@gmail.com wrote: We are trying to convert a Lucene-based search solution to a Solr/Lucene-based solution. The problem we have is that we currently have our data split into many indexes and Solr expects things to be in a single index unless you're sharding. In addition to this, our indexes wouldn't work well using the distributed search functionality in Solr because the documents are not evenly or randomly distributed. We are currently using Lucene's MultiSearcher to search over subsets of these indexes. I know this has been brought up a number of times in previous posts and the typical response is that the best thing to do is to convert everything into a single index. One of the major reasons for having the indexes split up the way we do is because different types of data need to be indexed at different intervals. You may need one index to be updated every 20 minutes and another is only updated every week. If we move to a single index, then we will constantly be warming and replacing searchers for the entire dataset, and will essentially render the searcher caches useless. If we were able to have multiple indexes, they would each have a searcher and updates would be isolated to a subset of the data. The other problem is that we will likely need to shard this large single index and there isn't a clean way to shard randomly and evenly across the of the data. We would, however like to shard a single data type. If we could use multiple indexes, we would likely be also sharding a small sub-set of them. Thanks in advance, Ben
Re: Multiple indexes inside a single core
Thanks Erick. The problem with multiple cores is that the documents are scored independently in each core. I would like to be able to search across both cores and have the scores 'normalized' in a way that's similar to what Lucene's MultiSearcher would do. As far a I understand, multiple cores would likely result in seriously skewed scores in my case since the documents are not distributed evenly or randomly. I could have one core/index with 20 million docs and another with 200. I've poked around in the code and this feature doesn't seem to exist. I would be happy with finding a decent place to try to add it. I'm not sure if there is a clean place for it. Ben On Oct 20, 2010, at 8:36 PM, Erick Erickson erickerick...@gmail.com wrote: It seems to me that multiple cores are along the lines you need, a single instance of Solr that can search across multiple sub-indexes that do not necessarily share schemas, and are independently maintainable.. This might be a good place to start: http://wiki.apache.org/solr/CoreAdmin HTH Erick On Wed, Oct 20, 2010 at 3:23 PM, ben boggess ben.bogg...@gmail.com wrote: We are trying to convert a Lucene-based search solution to a Solr/Lucene-based solution. The problem we have is that we currently have our data split into many indexes and Solr expects things to be in a single index unless you're sharding. In addition to this, our indexes wouldn't work well using the distributed search functionality in Solr because the documents are not evenly or randomly distributed. We are currently using Lucene's MultiSearcher to search over subsets of these indexes. I know this has been brought up a number of times in previous posts and the typical response is that the best thing to do is to convert everything into a single index. One of the major reasons for having the indexes split up the way we do is because different types of data need to be indexed at different intervals. You may need one index to be updated every 20 minutes and another is only updated every week. If we move to a single index, then we will constantly be warming and replacing searchers for the entire dataset, and will essentially render the searcher caches useless. If we were able to have multiple indexes, they would each have a searcher and updates would be isolated to a subset of the data. The other problem is that we will likely need to shard this large single index and there isn't a clean way to shard randomly and evenly across the of the data. We would, however like to shard a single data type. If we could use multiple indexes, we would likely be also sharding a small sub-set of them. Thanks in advance, Ben
Re: Multiple Indexes and relevance ranking question
The score of a document has no scale: it only has meaning against other score in the same query. Solr does not rank these documents correctly. Without sharing the TF/DF information across the shards, it cannot. If the shards each have a lot of the same kind of document, this problem averages out. That is, the statistical fingerprint across the shards is similar enough that each index gives the same numerical range. Yes, this is hand-wavey, and we don't have a measuring tool that verifies this assertion. Lance Valli Indraganti wrote: I an new to Solr and the search technologies. I am playing around with multiple indexes. I configured Solr for Tomcat, created two tomcat fragments so that two solr webapps listen on port 8080 in tomcat. I have created two separate indexes using each webapp successfully. My documents are very primitive. Below is the structure. I have four such documents with different doc id and increased number of the word Hello corresponding to the name of the document (this is only to make my analysis of the results easier). Documents One and two are in shar1 and three and four are in shard 2. obviously, document two is ranked higher when queried against that index (for the word Hello). And document four is ranked higher when queried against second index. When using the shards, parameter, the scores remain unaltered. My question is, if the distributed search does not consider IDF, how is it able to rank these documents correctly? Or do I not have the indexes truely distributed? Is something wrong with my term distribution? add -# doc field name=*id*Valli1/field field name=*name*One/field field name=*text*Hello!This is a test document testing relevancy scores./field /doc /add
RE: Multiple Indexes
Not sure if this will work for you but you can have 3 cores (using multicore) and have your solr server or the client decide on to which core it should be hitting. With this approach your can have separate schema.xml solrconfig.xml for each of the cores obviously separate index in each core. -Raghu -Original Message- From: anshuljohri [mailto:[EMAIL PROTECTED] Sent: Thursday, August 07, 2008 5:19 PM To: solr-user@lucene.apache.org Subject: Re: Multiple Indexes Both the cases are there. As i said i need to index 3 indexes. So 2 indexes have same schema but other one has different. More specification is like this -- I have 3 indexes. In which 2 indexes have same data model but the way these are indexed is different. So i need to fire query from backend on individual indexes based on input. But the 3rd index has diff schema also. Again the query will be fired on this index based on input. So my question is how can i handle this situation. Thru configuring multiple instances of Solr/Tomcat if ya than how? else what are the other ways on Solr 1.2 -Anshul zayhen wrote: Oh, Sorry! Can you be a little more specific? Do these indexes have different schemas, or do they represent the same data model? 2008/8/7 anshuljohri [EMAIL PROTECTED] Thanks zayhen for such a quick response but am not talking about sharding. I have requirement of indexing 3 indexes. Need to do query on diff indexes based on input. -Anshul zayhen wrote: 2008/8/7 anshuljohri [EMAIL PROTECTED] Hi everybody! I need to create multiple indexes lets say 3 due to project requirement. And the query will be fired from backend on different indexes based on input. I can't do it in one index with the help of fq parameter. As i have already thought on it but thats of no use. I assume you are talking about sharding. Go 1.3-dev. It runs smooth in my environment! So i searched a lot in this forum but couldn't get satisfactory answer. I found that there are 3 ways to do it. In which 2 ways are not applicable in 1.2 version. So i have to go with Multiple Tomcat instances option as in multiple webapps config. But still am not clear whether I need 3 diff solrConfig.xml schema.xml or I can do it with symlinks. Is there any tutorial or some reading material for this. Can anybody plz help me out? Thanks is advance -Anshul Johri -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880284.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alexander Ramos Jardim - RPG da Ilha -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880771.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alexander Ramos Jardim - RPG da Ilha -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880973.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple Indexes
Try putting them all in one index. Your fields can be s1_name for schema 1, s2_name for schema 2, and so on. The only reason to have separate indexes is if each group of content has a different update schedule and if you have high traffic (over 1M queries/day). wunder On 8/8/08 8:19 AM, Kashyap, Raghu [EMAIL PROTECTED] wrote: Not sure if this will work for you but you can have 3 cores (using multicore) and have your solr server or the client decide on to which core it should be hitting. With this approach your can have separate schema.xml solrconfig.xml for each of the cores obviously separate index in each core. -Raghu -Original Message- From: anshuljohri [mailto:[EMAIL PROTECTED] Sent: Thursday, August 07, 2008 5:19 PM To: solr-user@lucene.apache.org Subject: Re: Multiple Indexes Both the cases are there. As i said i need to index 3 indexes. So 2 indexes have same schema but other one has different. More specification is like this -- I have 3 indexes. In which 2 indexes have same data model but the way these are indexed is different. So i need to fire query from backend on individual indexes based on input. But the 3rd index has diff schema also. Again the query will be fired on this index based on input. So my question is how can i handle this situation. Thru configuring multiple instances of Solr/Tomcat if ya than how? else what are the other ways on Solr 1.2 -Anshul zayhen wrote: Oh, Sorry! Can you be a little more specific? Do these indexes have different schemas, or do they represent the same data model? 2008/8/7 anshuljohri [EMAIL PROTECTED] Thanks zayhen for such a quick response but am not talking about sharding. I have requirement of indexing 3 indexes. Need to do query on diff indexes based on input. -Anshul zayhen wrote: 2008/8/7 anshuljohri [EMAIL PROTECTED] Hi everybody! I need to create multiple indexes lets say 3 due to project requirement. And the query will be fired from backend on different indexes based on input. I can't do it in one index with the help of fq parameter. As i have already thought on it but thats of no use. I assume you are talking about sharding. Go 1.3-dev. It runs smooth in my environment! So i searched a lot in this forum but couldn't get satisfactory answer. I found that there are 3 ways to do it. In which 2 ways are not applicable in 1.2 version. So i have to go with Multiple Tomcat instances option as in multiple webapps config. But still am not clear whether I need 3 diff solrConfig.xml schema.xml or I can do it with symlinks. Is there any tutorial or some reading material for this. Can anybody plz help me out? Thanks is advance -Anshul Johri -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880284.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alexander Ramos Jardim - RPG da Ilha -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880771.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alexander Ramos Jardim - RPG da Ilha
Re: Multiple Indexes
I meant update frequency more than schedule. If one group of content is updated once per day and the another every ten minutes, and most of the traffic is going to the slow collection, splitting them could help. wunder On 8/8/08 8:25 AM, Walter Underwood [EMAIL PROTECTED] wrote: Try putting them all in one index. Your fields can be s1_name for schema 1, s2_name for schema 2, and so on. The only reason to have separate indexes is if each group of content has a different update schedule and if you have high traffic (over 1M queries/day). wunder On 8/8/08 8:19 AM, Kashyap, Raghu [EMAIL PROTECTED] wrote: Not sure if this will work for you but you can have 3 cores (using multicore) and have your solr server or the client decide on to which core it should be hitting. With this approach your can have separate schema.xml solrconfig.xml for each of the cores obviously separate index in each core. -Raghu -Original Message- From: anshuljohri [mailto:[EMAIL PROTECTED] Sent: Thursday, August 07, 2008 5:19 PM To: solr-user@lucene.apache.org Subject: Re: Multiple Indexes Both the cases are there. As i said i need to index 3 indexes. So 2 indexes have same schema but other one has different. More specification is like this -- I have 3 indexes. In which 2 indexes have same data model but the way these are indexed is different. So i need to fire query from backend on individual indexes based on input. But the 3rd index has diff schema also. Again the query will be fired on this index based on input. So my question is how can i handle this situation. Thru configuring multiple instances of Solr/Tomcat if ya than how? else what are the other ways on Solr 1.2 -Anshul zayhen wrote: Oh, Sorry! Can you be a little more specific? Do these indexes have different schemas, or do they represent the same data model? 2008/8/7 anshuljohri [EMAIL PROTECTED] Thanks zayhen for such a quick response but am not talking about sharding. I have requirement of indexing 3 indexes. Need to do query on diff indexes based on input. -Anshul zayhen wrote: 2008/8/7 anshuljohri [EMAIL PROTECTED] Hi everybody! I need to create multiple indexes lets say 3 due to project requirement. And the query will be fired from backend on different indexes based on input. I can't do it in one index with the help of fq parameter. As i have already thought on it but thats of no use. I assume you are talking about sharding. Go 1.3-dev. It runs smooth in my environment! So i searched a lot in this forum but couldn't get satisfactory answer. I found that there are 3 ways to do it. In which 2 ways are not applicable in 1.2 version. So i have to go with Multiple Tomcat instances option as in multiple webapps config. But still am not clear whether I need 3 diff solrConfig.xml schema.xml or I can do it with symlinks. Is there any tutorial or some reading material for this. Can anybody plz help me out? Thanks is advance -Anshul Johri -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880284.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alexander Ramos Jardim - RPG da Ilha -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880771.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alexander Ramos Jardim - RPG da Ilha
Re: Multiple Indexes
Thanks zayhen for such a quick response but am not talking about sharding. I have requirement of indexing 3 indexes. Need to do query on diff indexes based on input. -Anshul zayhen wrote: 2008/8/7 anshuljohri [EMAIL PROTECTED] Hi everybody! I need to create multiple indexes lets say 3 due to project requirement. And the query will be fired from backend on different indexes based on input. I can't do it in one index with the help of fq parameter. As i have already thought on it but thats of no use. I assume you are talking about sharding. Go 1.3-dev. It runs smooth in my environment! So i searched a lot in this forum but couldn't get satisfactory answer. I found that there are 3 ways to do it. In which 2 ways are not applicable in 1.2 version. So i have to go with Multiple Tomcat instances option as in multiple webapps config. But still am not clear whether I need 3 diff solrConfig.xml schema.xml or I can do it with symlinks. Is there any tutorial or some reading material for this. Can anybody plz help me out? Thanks is advance -Anshul Johri -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880284.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alexander Ramos Jardim - RPG da Ilha -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880771.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple Indexes
Oh, Sorry! Can you be a little more specific? Do these indexes have different schemas, or do they represent the same data model? 2008/8/7 anshuljohri [EMAIL PROTECTED] Thanks zayhen for such a quick response but am not talking about sharding. I have requirement of indexing 3 indexes. Need to do query on diff indexes based on input. -Anshul zayhen wrote: 2008/8/7 anshuljohri [EMAIL PROTECTED] Hi everybody! I need to create multiple indexes lets say 3 due to project requirement. And the query will be fired from backend on different indexes based on input. I can't do it in one index with the help of fq parameter. As i have already thought on it but thats of no use. I assume you are talking about sharding. Go 1.3-dev. It runs smooth in my environment! So i searched a lot in this forum but couldn't get satisfactory answer. I found that there are 3 ways to do it. In which 2 ways are not applicable in 1.2 version. So i have to go with Multiple Tomcat instances option as in multiple webapps config. But still am not clear whether I need 3 diff solrConfig.xml schema.xml or I can do it with symlinks. Is there any tutorial or some reading material for this. Can anybody plz help me out? Thanks is advance -Anshul Johri -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880284.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alexander Ramos Jardim - RPG da Ilha -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880771.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alexander Ramos Jardim
Re: Multiple Indexes
Both the cases are there. As i said i need to index 3 indexes. So 2 indexes have same schema but other one has different. More specification is like this -- I have 3 indexes. In which 2 indexes have same data model but the way these are indexed is different. So i need to fire query from backend on individual indexes based on input. But the 3rd index has diff schema also. Again the query will be fired on this index based on input. So my question is how can i handle this situation. Thru configuring multiple instances of Solr/Tomcat if ya than how? else what are the other ways on Solr 1.2 -Anshul zayhen wrote: Oh, Sorry! Can you be a little more specific? Do these indexes have different schemas, or do they represent the same data model? 2008/8/7 anshuljohri [EMAIL PROTECTED] Thanks zayhen for such a quick response but am not talking about sharding. I have requirement of indexing 3 indexes. Need to do query on diff indexes based on input. -Anshul zayhen wrote: 2008/8/7 anshuljohri [EMAIL PROTECTED] Hi everybody! I need to create multiple indexes lets say 3 due to project requirement. And the query will be fired from backend on different indexes based on input. I can't do it in one index with the help of fq parameter. As i have already thought on it but thats of no use. I assume you are talking about sharding. Go 1.3-dev. It runs smooth in my environment! So i searched a lot in this forum but couldn't get satisfactory answer. I found that there are 3 ways to do it. In which 2 ways are not applicable in 1.2 version. So i have to go with Multiple Tomcat instances option as in multiple webapps config. But still am not clear whether I need 3 diff solrConfig.xml schema.xml or I can do it with symlinks. Is there any tutorial or some reading material for this. Can anybody plz help me out? Thanks is advance -Anshul Johri -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880284.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alexander Ramos Jardim - RPG da Ilha -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880771.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alexander Ramos Jardim - RPG da Ilha -- View this message in context: http://www.nabble.com/Multiple-Indexes-tp18880284p18880973.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Multiple indexes
Hello, Until now, i've used two instance of solr, one for each of my collections ; it works fine, but i wonder if there is an advantage to use multiple indexes in one instance over several instances with one index each ? Note that the two indexes have different schema.xml. Thanks. PL Date: Thu, 8 Nov 2007 18:05:43 -0500 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Multiple indexes Hi, I am looking for the way to utilize the multiple indexes for signle sole instance. I saw that there is the patch 215 available and would like to ask someone who knows how to use multiple indexes. Thanks, Jae Joo _ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vistamkt=en-USform=QBRE
Re: Multiple indexes
The advantages of a multi-core setup are configuration flexibility and dynamically changing available options (without a full restart). For high-performance production solr servers, I don't think there is much reason for it. You may want to split the two indexes on to two machines. You may want to run each index in a separate JVM (so if one crashes, the other does not) Maintaining 2 indexes is pretty easy, if that was a larger number or you need to create indexes for each user in a system then it would be worth investigating the multi-core setup (it is still in development) ryan Pierre-Yves LANDRON wrote: Hello, Until now, i've used two instance of solr, one for each of my collections ; it works fine, but i wonder if there is an advantage to use multiple indexes in one instance over several instances with one index each ? Note that the two indexes have different schema.xml. Thanks. PL Date: Thu, 8 Nov 2007 18:05:43 -0500 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Multiple indexes Hi, I am looking for the way to utilize the multiple indexes for signle sole instance. I saw that there is the patch 215 available and would like to ask someone who knows how to use multiple indexes. Thanks, Jae Joo _ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vistamkt=en-USform=QBRE
Re: Multiple indexes
Here is my situation. I have 6 millions articles indexed and adding about 10k articles everyday. If I maintain only one index, whenever the daily feeding is running, it consumes the heap area and causes FGC. I am thinking the way to have multiple indexes - one is for ongoing querying service and one is for update. Once update is done, switch the index by automatically and/or my application. Thanks, Jae joo On Nov 12, 2007 8:48 AM, Ryan McKinley [EMAIL PROTECTED] wrote: The advantages of a multi-core setup are configuration flexibility and dynamically changing available options (without a full restart). For high-performance production solr servers, I don't think there is much reason for it. You may want to split the two indexes on to two machines. You may want to run each index in a separate JVM (so if one crashes, the other does not) Maintaining 2 indexes is pretty easy, if that was a larger number or you need to create indexes for each user in a system then it would be worth investigating the multi-core setup (it is still in development) ryan Pierre-Yves LANDRON wrote: Hello, Until now, i've used two instance of solr, one for each of my collections ; it works fine, but i wonder if there is an advantage to use multiple indexes in one instance over several instances with one index each ? Note that the two indexes have different schema.xml. Thanks. PL Date: Thu, 8 Nov 2007 18:05:43 -0500 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Multiple indexes Hi, I am looking for the way to utilize the multiple indexes for signle sole instance. I saw that there is the patch 215 available and would like to ask someone who knows how to use multiple indexes. Thanks, Jae Joo _ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vistamkt=en-USform=QBRE
Re: Multiple indexes
just use the standard collection distribution stuff. That is what it is made for! http://wiki.apache.org/solr/CollectionDistribution Alternatively, open up two indexes using the same config/dir -- do your indexing on one and the searching on the other. when indexing is done (or finishes a big chunk) send commit/ to the 'searching' one and it will see the new stuff. ryan Jae Joo wrote: Here is my situation. I have 6 millions articles indexed and adding about 10k articles everyday. If I maintain only one index, whenever the daily feeding is running, it consumes the heap area and causes FGC. I am thinking the way to have multiple indexes - one is for ongoing querying service and one is for update. Once update is done, switch the index by automatically and/or my application. Thanks, Jae joo On Nov 12, 2007 8:48 AM, Ryan McKinley [EMAIL PROTECTED] wrote: The advantages of a multi-core setup are configuration flexibility and dynamically changing available options (without a full restart). For high-performance production solr servers, I don't think there is much reason for it. You may want to split the two indexes on to two machines. You may want to run each index in a separate JVM (so if one crashes, the other does not) Maintaining 2 indexes is pretty easy, if that was a larger number or you need to create indexes for each user in a system then it would be worth investigating the multi-core setup (it is still in development) ryan Pierre-Yves LANDRON wrote: Hello, Until now, i've used two instance of solr, one for each of my collections ; it works fine, but i wonder if there is an advantage to use multiple indexes in one instance over several instances with one index each ? Note that the two indexes have different schema.xml. Thanks. PL Date: Thu, 8 Nov 2007 18:05:43 -0500 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Multiple indexes Hi, I am looking for the way to utilize the multiple indexes for signle sole instance. I saw that there is the patch 215 available and would like to ask someone who knows how to use multiple indexes. Thanks, Jae Joo _ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vistamkt=en-USform=QBRE
Re: Multiple indexes
I have built the master solr instance and indexed some files. Once I run snapshotter, i complains the error.. - snapshooter -d data/index (in solr/bin directory) Did I missed something? ++ date '+%Y/%m/%d %H:%M:%S' + echo 2007/11/12 12:38:40 taking snapshot /solr/master/solr/data/index/snapshot.20071112123840 + [[ -n '' ]] + mv /solr/master/solr/data/index/temp-snapshot.20071112123840/solr/master/solr/data/index/snapshot.20071112123840 mv: cannot access /solr/master/solr/data/index/temp-snapshot.20071112123840 Jae On Nov 12, 2007 9:09 AM, Ryan McKinley [EMAIL PROTECTED] wrote: just use the standard collection distribution stuff. That is what it is made for! http://wiki.apache.org/solr/CollectionDistribution Alternatively, open up two indexes using the same config/dir -- do your indexing on one and the searching on the other. when indexing is done (or finishes a big chunk) send commit/ to the 'searching' one and it will see the new stuff. ryan Jae Joo wrote: Here is my situation. I have 6 millions articles indexed and adding about 10k articles everyday. If I maintain only one index, whenever the daily feeding is running, it consumes the heap area and causes FGC. I am thinking the way to have multiple indexes - one is for ongoing querying service and one is for update. Once update is done, switch the index by automatically and/or my application. Thanks, Jae joo On Nov 12, 2007 8:48 AM, Ryan McKinley [EMAIL PROTECTED] wrote: The advantages of a multi-core setup are configuration flexibility and dynamically changing available options (without a full restart). For high-performance production solr servers, I don't think there is much reason for it. You may want to split the two indexes on to two machines. You may want to run each index in a separate JVM (so if one crashes, the other does not) Maintaining 2 indexes is pretty easy, if that was a larger number or you need to create indexes for each user in a system then it would be worth investigating the multi-core setup (it is still in development) ryan Pierre-Yves LANDRON wrote: Hello, Until now, i've used two instance of solr, one for each of my collections ; it works fine, but i wonder if there is an advantage to use multiple indexes in one instance over several instances with one index each ? Note that the two indexes have different schema.xml. Thanks. PL Date: Thu, 8 Nov 2007 18:05:43 -0500 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Multiple indexes Hi, I am looking for the way to utilize the multiple indexes for signle sole instance. I saw that there is the patch 215 available and would like to ask someone who knows how to use multiple indexes. Thanks, Jae Joo _ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vistamkt=en-USform=QBRE
Re: Multiple indexes
I've had good luck with MultiCore, but you have to sync trunk from svn and apply the most recent patch in SOLR-350. https://issues.apache.org/jira/browse/SOLR-350 -jrr Jae Joo wrote: Hi, I am looking for the way to utilize the multiple indexes for signle sole instance. I saw that there is the patch 215 available and would like to ask someone who knows how to use multiple indexes. Thanks, Jae Joo
RE: Multiple indexes
Is there functionality for partitioning Solr indexes onto multiple machines? For this to work, I suppose that Solr would have to combine the results from the various machines. I think Nutch does this with the distributed searcher functionality. -Nathan -Original Message- From: Mike Klaas [mailto:[EMAIL PROTECTED] Sent: Thursday, August 30, 2007 11:44 AM To: solr-user@lucene.apache.org Subject: Re: Multiple indexes On 29-Aug-07, at 10:21 PM, James liu wrote: Does it affect with doc size? for example 2 billion docs, 10k doc2 billion docs, but doc size is 10m. There might be other places that have 2G limit (see lucene index format docs), but many things are vints and can grow larger. Of course you will hit physical limits of your machine long before you can achieve your hypothetical situation: that's 20,000 Tb, which is many, many times the size of a complete internet crawl. -Mike 2007/8/30, Mike Klaas [EMAIL PROTECTED]: 2 billion docs (signed int). On 29-Aug-07, at 6:24 PM, James liu wrote: what is the limits for Lucene and Solr. 100m, 1000m, 5000m or other number docs? 2007/8/24, Walter Underwood [EMAIL PROTECTED]: It should work fine to index them and search them. 13 million docs is not even close to the limits for Lucene and Solr. Have you had problems? wunder On 8/23/07 7:30 AM, Jae Joo [EMAIL PROTECTED] wrote: Is there any solution to handle 13 millions document shown as below? Each document is not big, but the number of ones is 13 million. Any way to utilize the multiple indexes? Thanks, Jae Joo docfield name=trade2/field field name=company_nameUnlimi-Tech Software Inc/field field name=phys_stabrvON/field field name=trade4/field field name=status_id_descrSingle Location/field field name=trade3/field field name=phys_countryCanada/field field name=phys_zipK1C 4R1/field field name=phys_cityOttawa/field field name=phys_stateOntario/field field name=sic2G2_Computer Software/field field name=phys_address1447a Youville Dr/field field name=sic1G_Technology amp; Communications/field field name=duns_number203439018/field field name=trade1/field field name=phys_countyCarleton/field field name=trade5/field field name=status_id_rank30/field field name=sic4G2173_Computer Programming Services/field field name=sic8G217308D_Computer software development/field /doc -- regards jl -- regards jl
Re: Multiple indexes
On 30-Aug-07, at 10:57 AM, Nathaniel E. Powell wrote: Is there functionality for partitioning Solr indexes onto multiple machines? For this to work, I suppose that Solr would have to combine the results from the various machines. I think Nutch does this with the distributed searcher functionality. Not currently developed. See http://wiki.apache.org/solr/FederatedSearch and http://issues.apache.org/jira/browse/SOLR-303 -Mike -Nathan -Original Message- From: Mike Klaas [mailto:[EMAIL PROTECTED] Sent: Thursday, August 30, 2007 11:44 AM To: solr-user@lucene.apache.org Subject: Re: Multiple indexes On 29-Aug-07, at 10:21 PM, James liu wrote: Does it affect with doc size? for example 2 billion docs, 10k doc2 billion docs, but doc size is 10m. There might be other places that have 2G limit (see lucene index format docs), but many things are vints and can grow larger. Of course you will hit physical limits of your machine long before you can achieve your hypothetical situation: that's 20,000 Tb, which is many, many times the size of a complete internet crawl. -Mike 2007/8/30, Mike Klaas [EMAIL PROTECTED]: 2 billion docs (signed int). On 29-Aug-07, at 6:24 PM, James liu wrote: what is the limits for Lucene and Solr. 100m, 1000m, 5000m or other number docs? 2007/8/24, Walter Underwood [EMAIL PROTECTED]: It should work fine to index them and search them. 13 million docs is not even close to the limits for Lucene and Solr. Have you had problems? wunder On 8/23/07 7:30 AM, Jae Joo [EMAIL PROTECTED] wrote: Is there any solution to handle 13 millions document shown as below? Each document is not big, but the number of ones is 13 million. Any way to utilize the multiple indexes? Thanks, Jae Joo docfield name=trade2/field field name=company_nameUnlimi-Tech Software Inc/field field name=phys_stabrvON/field field name=trade4/field field name=status_id_descrSingle Location/field field name=trade3/field field name=phys_countryCanada/field field name=phys_zipK1C 4R1/field field name=phys_cityOttawa/field field name=phys_stateOntario/field field name=sic2G2_Computer Software/field field name=phys_address1447a Youville Dr/field field name=sic1G_Technology amp; Communications/field field name=duns_number203439018/field field name=trade1/field field name=phys_countyCarleton/field field name=trade5/field field name=status_id_rank30/field field name=sic4G2173_Computer Programming Services/field field name=sic8G217308D_Computer software development/field /doc -- regards jl -- regards jl
Re: Multiple indexes
OK...I see...thk u ,mike. 2007/8/31, Mike Klaas [EMAIL PROTECTED]: On 29-Aug-07, at 10:21 PM, James liu wrote: Does it affect with doc size? for example 2 billion docs, 10k doc2 billion docs, but doc size is 10m. There might be other places that have 2G limit (see lucene index format docs), but many things are vints and can grow larger. Of course you will hit physical limits of your machine long before you can achieve your hypothetical situation: that's 20,000 Tb, which is many, many times the size of a complete internet crawl. -Mike 2007/8/30, Mike Klaas [EMAIL PROTECTED]: 2 billion docs (signed int). On 29-Aug-07, at 6:24 PM, James liu wrote: what is the limits for Lucene and Solr. 100m, 1000m, 5000m or other number docs? 2007/8/24, Walter Underwood [EMAIL PROTECTED]: It should work fine to index them and search them. 13 million docs is not even close to the limits for Lucene and Solr. Have you had problems? wunder On 8/23/07 7:30 AM, Jae Joo [EMAIL PROTECTED] wrote: Is there any solution to handle 13 millions document shown as below? Each document is not big, but the number of ones is 13 million. Any way to utilize the multiple indexes? Thanks, Jae Joo docfield name=trade2/field field name=company_nameUnlimi-Tech Software Inc/field field name=phys_stabrvON/field field name=trade4/field field name=status_id_descrSingle Location/field field name=trade3/field field name=phys_countryCanada/field field name=phys_zipK1C 4R1/field field name=phys_cityOttawa/field field name=phys_stateOntario/field field name=sic2G2_Computer Software/field field name=phys_address1447a Youville Dr/field field name=sic1G_Technology amp; Communications/field field name=duns_number203439018/field field name=trade1/field field name=phys_countyCarleton/field field name=trade5/field field name=status_id_rank30/field field name=sic4G2173_Computer Programming Services/field field name=sic8G217308D_Computer software development/field /doc -- regards jl -- regards jl -- regards jl
Re: Multiple indexes
2 billion docs (signed int). On 29-Aug-07, at 6:24 PM, James liu wrote: what is the limits for Lucene and Solr. 100m, 1000m, 5000m or other number docs? 2007/8/24, Walter Underwood [EMAIL PROTECTED]: It should work fine to index them and search them. 13 million docs is not even close to the limits for Lucene and Solr. Have you had problems? wunder On 8/23/07 7:30 AM, Jae Joo [EMAIL PROTECTED] wrote: Is there any solution to handle 13 millions document shown as below? Each document is not big, but the number of ones is 13 million. Any way to utilize the multiple indexes? Thanks, Jae Joo docfield name=trade2/field field name=company_nameUnlimi-Tech Software Inc/field field name=phys_stabrvON/field field name=trade4/field field name=status_id_descrSingle Location/field field name=trade3/field field name=phys_countryCanada/field field name=phys_zipK1C 4R1/field field name=phys_cityOttawa/field field name=phys_stateOntario/field field name=sic2G2_Computer Software/field field name=phys_address1447a Youville Dr/field field name=sic1G_Technology amp; Communications/field field name=duns_number203439018/field field name=trade1/field field name=phys_countyCarleton/field field name=trade5/field field name=status_id_rank30/field field name=sic4G2173_Computer Programming Services/field field name=sic8G217308D_Computer software development/field /doc -- regards jl
Re: Multiple indexes
Does it affect with doc size? for example 2 billion docs, 10k doc2 billion docs, but doc size is 10m. 2007/8/30, Mike Klaas [EMAIL PROTECTED]: 2 billion docs (signed int). On 29-Aug-07, at 6:24 PM, James liu wrote: what is the limits for Lucene and Solr. 100m, 1000m, 5000m or other number docs? 2007/8/24, Walter Underwood [EMAIL PROTECTED]: It should work fine to index them and search them. 13 million docs is not even close to the limits for Lucene and Solr. Have you had problems? wunder On 8/23/07 7:30 AM, Jae Joo [EMAIL PROTECTED] wrote: Is there any solution to handle 13 millions document shown as below? Each document is not big, but the number of ones is 13 million. Any way to utilize the multiple indexes? Thanks, Jae Joo docfield name=trade2/field field name=company_nameUnlimi-Tech Software Inc/field field name=phys_stabrvON/field field name=trade4/field field name=status_id_descrSingle Location/field field name=trade3/field field name=phys_countryCanada/field field name=phys_zipK1C 4R1/field field name=phys_cityOttawa/field field name=phys_stateOntario/field field name=sic2G2_Computer Software/field field name=phys_address1447a Youville Dr/field field name=sic1G_Technology amp; Communications/field field name=duns_number203439018/field field name=trade1/field field name=phys_countyCarleton/field field name=trade5/field field name=status_id_rank30/field field name=sic4G2173_Computer Programming Services/field field name=sic8G217308D_Computer software development/field /doc -- regards jl -- regards jl
Re: Multiple indexes
It should work fine to index them and search them. 13 million docs is not even close to the limits for Lucene and Solr. Have you had problems? wunder On 8/23/07 7:30 AM, Jae Joo [EMAIL PROTECTED] wrote: Is there any solution to handle 13 millions document shown as below? Each document is not big, but the number of ones is 13 million. Any way to utilize the multiple indexes? Thanks, Jae Joo docfield name=trade2/field field name=company_nameUnlimi-Tech Software Inc/field field name=phys_stabrvON/field field name=trade4/field field name=status_id_descrSingle Location/field field name=trade3/field field name=phys_countryCanada/field field name=phys_zipK1C 4R1/field field name=phys_cityOttawa/field field name=phys_stateOntario/field field name=sic2G2_Computer Software/field field name=phys_address1447a Youville Dr/field field name=sic1G_Technology amp; Communications/field field name=duns_number203439018/field field name=trade1/field field name=phys_countyCarleton/field field name=trade5/field field name=status_id_rank30/field field name=sic4G2173_Computer Programming Services/field field name=sic8G217308D_Computer software development/field /doc
Re: Multiple indexes?
Why not just store an additional object_type field which differentiates between the actual type of data you are looking for? So if you're looking for some shoes: (size:8 AND color:'blue') AND object_type:'shoe' Or if you're searching on brands (genre:'skater' AND brand_desc:'skater boy') AND object_type:'brand' I apologize if I misunderstood your question. /cody On 4/19/07, Matthew Runo [EMAIL PROTECTED] wrote: Hey there- I was wondering if the following was possible, and, if so, how to set it up... I want to index two different types of data, and have them searchable from the same interface. For example, a group of products, with size, color, price, etc info. And a group of brands, with brand, genre, brand description, etc info So, the info does overlap some. But a lot of the fields for each type don't matter to the other. Is there a way to set up two different schema so that both types may be indexed with relative ease? ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++
Re: Multiple indexes?
You can not have more than one Solr core per application (to be precise, per class-loader since there are a few statics). One way is thus to have 2 webapps - when if indexes do not have the same lifetime/radically different schema/etc. However, the common wisdom is that you usually dont really need different indexes (I discussed about this last week). If you really are in desperate need of multiple cores, in the 'Multiple Solr Cores' thread, you'll find (early state) patches that allow just that... Cheers Henri Matthew Runo wrote: Hey there- I was wondering if the following was possible, and, if so, how to set it up... I want to index two different types of data, and have them searchable from the same interface. For example, a group of products, with size, color, price, etc info. And a group of brands, with brand, genre, brand description, etc info So, the info does overlap some. But a lot of the fields for each type don't matter to the other. Is there a way to set up two different schema so that both types may be indexed with relative ease? ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ -- View this message in context: http://www.nabble.com/Multiple-indexes--tf3608429.html#a10083580 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple indexes?
If you're doing this in Ruby, there is an acts_as_solr plugin for Rails which takes exactly this approach to store all different kinds of Model objects in the same index...I just took the idea from there... /Cody On 4/19/07, Matthew Runo [EMAIL PROTECTED] wrote: Ah hah! This appears to be what I'm interested in doing. I'll have to read up on object_types. ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Apr 19, 2007, at 10:04 AM, Cody Caughlan wrote: Why not just store an additional object_type field which differentiates between the actual type of data you are looking for? So if you're looking for some shoes: (size:8 AND color:'blue') AND object_type:'shoe' Or if you're searching on brands (genre:'skater' AND brand_desc:'skater boy') AND object_type:'brand' I apologize if I misunderstood your question. /cody On 4/19/07, Matthew Runo [EMAIL PROTECTED] wrote: Hey there- I was wondering if the following was possible, and, if so, how to set it up... I want to index two different types of data, and have them searchable from the same interface. For example, a group of products, with size, color, price, etc info. And a group of brands, with brand, genre, brand description, etc info So, the info does overlap some. But a lot of the fields for each type don't matter to the other. Is there a way to set up two different schema so that both types may be indexed with relative ease? ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++
Re: Multiple indexes?
I'll actually be doing this in Perl.. any ideas on perl? heh ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Apr 19, 2007, at 11:59 AM, Cody Caughlan wrote: If you're doing this in Ruby, there is an acts_as_solr plugin for Rails which takes exactly this approach to store all different kinds of Model objects in the same index...I just took the idea from there... /Cody On 4/19/07, Matthew Runo [EMAIL PROTECTED] wrote: Ah hah! This appears to be what I'm interested in doing. I'll have to read up on object_types. ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Apr 19, 2007, at 10:04 AM, Cody Caughlan wrote: Why not just store an additional object_type field which differentiates between the actual type of data you are looking for? So if you're looking for some shoes: (size:8 AND color:'blue') AND object_type:'shoe' Or if you're searching on brands (genre:'skater' AND brand_desc:'skater boy') AND object_type:'brand' I apologize if I misunderstood your question. /cody On 4/19/07, Matthew Runo [EMAIL PROTECTED] wrote: Hey there- I was wondering if the following was possible, and, if so, how to set it up... I want to index two different types of data, and have them searchable from the same interface. For example, a group of products, with size, color, price, etc info. And a group of brands, with brand, genre, brand description, etc info So, the info does overlap some. But a lot of the fields for each type don't matter to the other. Is there a way to set up two different schema so that both types may be indexed with relative ease? ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++
Re: Multiple indexes?
Matthew, All that is meant by object_types is an additional stored/indexed field in the Solr schema that gets added to every document providing context of which type it is (shoes or brands). Then you can limit searches to a particular area by just filtering on type:shoes, for example. Erik p.s. I could use some new shoes! On Apr 19, 2007, at 3:17 PM, Matthew Runo wrote: I'll actually be doing this in Perl.. any ideas on perl? heh ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Apr 19, 2007, at 11:59 AM, Cody Caughlan wrote: If you're doing this in Ruby, there is an acts_as_solr plugin for Rails which takes exactly this approach to store all different kinds of Model objects in the same index...I just took the idea from there... /Cody On 4/19/07, Matthew Runo [EMAIL PROTECTED] wrote: Ah hah! This appears to be what I'm interested in doing. I'll have to read up on object_types. ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Apr 19, 2007, at 10:04 AM, Cody Caughlan wrote: Why not just store an additional object_type field which differentiates between the actual type of data you are looking for? So if you're looking for some shoes: (size:8 AND color:'blue') AND object_type:'shoe' Or if you're searching on brands (genre:'skater' AND brand_desc:'skater boy') AND object_type:'brand' I apologize if I misunderstood your question. /cody On 4/19/07, Matthew Runo [EMAIL PROTECTED] wrote: Hey there- I was wondering if the following was possible, and, if so, how to set it up... I want to index two different types of data, and have them searchable from the same interface. For example, a group of products, with size, color, price, etc info. And a group of brands, with brand, genre, brand description, etc info So, the info does overlap some. But a lot of the fields for each type don't matter to the other. Is there a way to set up two different schema so that both types may be indexed with relative ease? ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++
Re: Multiple indexes?
Ah. That makes sense then. I wasn't sure if that was the best way to go about things or not. I didn't want to end up with a bunch of fields that were not being used all the time if it would cause a degradation in search quality. ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Apr 19, 2007, at 12:59 PM, Erik Hatcher wrote: Matthew, All that is meant by object_types is an additional stored/indexed field in the Solr schema that gets added to every document providing context of which type it is (shoes or brands). Then you can limit searches to a particular area by just filtering on type:shoes, for example. Erik p.s. I could use some new shoes! On Apr 19, 2007, at 3:17 PM, Matthew Runo wrote: I'll actually be doing this in Perl.. any ideas on perl? heh ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Apr 19, 2007, at 11:59 AM, Cody Caughlan wrote: If you're doing this in Ruby, there is an acts_as_solr plugin for Rails which takes exactly this approach to store all different kinds of Model objects in the same index...I just took the idea from there... /Cody On 4/19/07, Matthew Runo [EMAIL PROTECTED] wrote: Ah hah! This appears to be what I'm interested in doing. I'll have to read up on object_types. ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Apr 19, 2007, at 10:04 AM, Cody Caughlan wrote: Why not just store an additional object_type field which differentiates between the actual type of data you are looking for? So if you're looking for some shoes: (size:8 AND color:'blue') AND object_type:'shoe' Or if you're searching on brands (genre:'skater' AND brand_desc:'skater boy') AND object_type:'brand' I apologize if I misunderstood your question. /cody On 4/19/07, Matthew Runo [EMAIL PROTECTED] wrote: Hey there- I was wondering if the following was possible, and, if so, how to set it up... I want to index two different types of data, and have them searchable from the same interface. For example, a group of products, with size, color, price, etc info. And a group of brands, with brand, genre, brand description, etc info So, the info does overlap some. But a lot of the fields for each type don't matter to the other. Is there a way to set up two different schema so that both types may be indexed with relative ease? ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++
Re: Multiple indexes?
: So if you're looking for some shoes: : (size:8 AND color:'blue') AND object_type:'shoe' : Or if you're searching on brands : (genre:'skater' AND brand_desc:'skater boy') AND object_type:'brand' a slight improvement on this: put your object_type restriction in a filter query (fq=object_type:foo) and not in your main query ... that way it won't affect the scoring, and it will be cached uniquely so the work of identifying the set of all shows will only be done once per commit (and likewise for brands) -Hoss
Re: Multiple indexes?
As this question comes up so often, i put a new page on the wiki: http://wiki.apache.org/solr/MultipleIndexes We should fill in more details and link it to the front page. Chris Hostetter wrote: : So if you're looking for some shoes: : (size:8 AND color:'blue') AND object_type:'shoe' : Or if you're searching on brands : (genre:'skater' AND brand_desc:'skater boy') AND object_type:'brand' a slight improvement on this: put your object_type restriction in a filter query (fq=object_type:foo) and not in your main query ... that way it won't affect the scoring, and it will be cached uniquely so the work of identifying the set of all shows will only be done once per commit (and likewise for brands) -Hoss
Re: multiple indexes
Why not create a multivalued field that stores the customer perms? add has_access:cust1 has_access:cust2, etc to the document at index time, and turn this into a filter query at query time? that is what we are doing at the moment, and i must say, it works very and does not slow the server down at all (because of the efficient indexes that solr builds) Mike Klaas [EMAIL PROTECTED] 22/03/2007 19:15 Please respond to solr-user@lucene.apache.org To solr-user@lucene.apache.org cc Subject Re: multiple indexes On 3/22/07, Kevin Osborn [EMAIL PROTECTED] wrote: Here is an issue that I am trying to resolve. We have a large catalog of documents, but our customers (several hundred) can only see a subset of those documents. And the subsets vary in size greatly. And some of these customers will be creating a lot of traffic. Also, there is no way to map the subsets to a query. The customer either has access to a document or they don't. Has anybody worked on this issue before? If I use one large index and do the filtering in my application, then Solr will be serving a lot of useless documents. The counts would also be screwed up for facet queries. Is the best solution to extend Solr and do the filtering there? The other potential solution is to have one index per customer. This would require one instance of the servlet per index, correct? It just seems like this would require a lot of hardware and complexity (configuring the memory of each servlet instance to index size and traffic). Why not create a multivalued field that stores the customer perms? add has_access:cust1 has_access:cust2, etc to the document at index time, and turn this into a filter query at query time? -Mike
Re: multiple indexes
: Why not create a multivalued field that stores the customer perms? : add has_access:cust1 has_access:cust2, etc to the document at index : time, and turn this into a filter query at query time? this can be a particularly effective solution when the permissions don't change at all .. the ideal solution is where each doc is owned by one and only one customer, but either way it's a matter of listing all of the customers that have access to the document in a field, and filtering on it. -- for a few hundred customers it's not a lot of work to cache those filters, autowarming will help ensure that it's efficient. this approach doesn't scale particulararly well to the tens of thousands of users thta might search your site, but at that point you have to start thinking about how you model the access in your underlying datamodel ... odds are you have some concept of public documents versus private documents, and hte private documents might have Access Control lists based on groups and you can filter on that type of information instead. -Hoss
Re: Multiple indexes
This is good information, thanks Chris. My preference was to keep things separate, just needed some external info from others to back me up. thanks, jeff On 1/7/07, Chris Hostetter [EMAIL PROTECTED] wrote: I don't know if there really are any general purpose best practices ... it really depends on use cases -- the main motivation for allowing JNDI context specification of the solr.home location so that multiple instances of SOlr can run in a single instace of a servlet container was so that if you *wanted* to run multiple instances in a single JVM, they could share one heap space, and you wouldn't have to guess how much memory to allocate to multiple instances -- but wether or not you *want* to have a single instance or not is really up to you. the plus side (as i mentioned) is that you can throw all of your available memory at that single JVM instance, and not worry about how much ram each solr instance really needs. the down side is that if any one solr instance really gets hammered to hell by it's users and rolls over and dies, it could bring down your other solr instances as well -- which may not be a big deal if in your use cases all solr instances get hit equally (via a meta searcher) but might be quite a big problem if those seperate instances are completely independent (ie: each paid for by seperate clients) personally: if you've got the resources (money/boxes/RAM) i would recommend keeping everything isolated. (the nice thing about my job is that while i frequently walk out of meetings with the directive to make it faster, I've never been asked to make it use less RAM) -Hoss
Re: Multiple indexes...
What is the advantage to running multiple indexes from a single Solr instance over multiple Solr instances each serving a single index? Erik On Dec 21, 2006, at 3:26 PM, escher2k wrote: I looked at the forums and found that it is not possible to have multiple indexes associated with one app server instance ? Is the best way to run multiple app server instances ? It would be a nice enhancement to support parameterization of the index to be used. -- View this message in context: http://www.nabble.com/Multiple- indexes...-tf2867500.html#a8014384 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple indexes...
I guess the updates can also be done the same way then. So, I just have to create multiple context paths with different schema.xml files in each ? Thanks. ryan mckinley wrote: You can run multiple webapps on a single app server (running on a single port). Just give each index a seperate context path. for example, you could have: http://xyz:8765/index1/select/?q=xxx http://xyz:8765/index2/select/?q=xxx http://xyz:8765/index3/select/?q=xxx On 12/21/06, escher2k [EMAIL PROTECTED] wrote: I looked at the forums and found that it is not possible to have multiple indexes associated with one app server instance ? Is the best way to run multiple app server instances ? It would be a nice enhancement to support parameterization of the index to be used. -- View this message in context: http://www.nabble.com/Multiple-indexes...-tf2867500.html#a8014384 Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Multiple-indexes...-tf2867500.html#a8018019 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple indexes...
: I guess the updates can also be done the same way then. So, I just have : to create multiple context paths with different schema.xml files in each ? : Thanks. you wouldn't even need idffernet schema.xml files ... they could all be symlinks to the same schema.xml, only the data directories would need to be different. -Hoss