Re: SolrCloud indexing -- 2 collections, 2 indexes, sharing the same nodes possible?
Thank you, Susheel, for the quick response. So, that means that when I create a new collection, it shards will be newly created at each node, right? Thus, if I have two collections with numShards=38, maxShardsPerNode=2 and replicationFactor=2 on my 38 nodes, then this would result in each node "hosting" 4 shards (two from each collection). If this is correct, I have two follow up questions: 1) As regards naming of the shards: Is using the same naming for the shards o.k. in this constellation? I.e. does it create trouble to have e.g. "Shard001", "Shard002", etc. in collection1 and "Shard001", "Shard002", etc. as well in collection2? 2) Performance: In my current single collection setup, I have 2 shards per node. After creating the second collection, there will be 4 shards per node. Do I have to edit the RAM per node value (raise the -m parameter when starting the node)? In my case, I am quite sure that the collections will never be queried simultaneously. So will the "running but idle" collection slow me down? Johannes -Ursprüngliche Nachricht- Von: Susheel Kumar [mailto:susheel2...@gmail.com] Gesendet: Mittwoch, 30. August 2017 17:36 An: solr-user@lucene.apache.org Betreff: Re: SolrCloud indexing -- 2 collections, 2 indexes, sharing the same nodes possible? Yes, absolutely. You can create as many as collections you need (like you would create table in relational world). On Wed, Aug 30, 2017 at 10:13 AM, Johannes Knaus wrote: > I have a working SolrCloud-Setup with 38 nodes with a collection > spanning over these nodes with 2 shards per node and replication > factor 2 and a router field. > > Now I got some new data for indexing which has the same structure and > size as my existing index in the described collection. > However, although it has the same structure the new data to be indexed > should not be mixed with the old data. > > Do I have create another 38 new nodes and a new collection and index > the new data or is there a better / more efficient way I could use the > existing nodes? > Is it possible that the 2 collections could share the 38 nodes without > the indexes being mixed? > > Thanks for your help. > > Johannes >
SolrCloud indexing -- 2 collections, 2 indexes, sharing the same nodes possible?
I have a working SolrCloud-Setup with 38 nodes with a collection spanning over these nodes with 2 shards per node and replication factor 2 and a router field. Now I got some new data for indexing which has the same structure and size as my existing index in the described collection. However, although it has the same structure the new data to be indexed should not be mixed with the old data. Do I have create another 38 new nodes and a new collection and index the new data or is there a better / more efficient way I could use the existing nodes? Is it possible that the 2 collections could share the 38 nodes without the indexes being mixed? Thanks for your help. Johannes
Sharding vs single index vs separate collection
Hi, I have a solr cloud setup, with document routing (implicit routing with router field). As the index is about documents with a publication date, I routed according the publication year, as in my case, most of the search queries will have a year specified. Now, what would be the best strategy -as regards performance (i.e. a huge amount of queries to be processed)- for search queries without any year specified? 1 - Is it enough to define that these queries should go over all routes (i.e. route=year1, year2, ..., yearN)? 2 - Would it be better to add a separate node with a separate index that is not routed (but maybe sharded/splitted)? If so, how should I deal with such a separate index? Is it possible to add it to my existing Solr cloud? Would it go into a separate collection? Thanks for your advice. Johannes
Re: AW: What does the replication factor parameter in collections api do?
Thank you all very much for your answers. That definitely explains it. All the best, Johannes > Am 13.04.2017 um 17:03 schrieb Erick Erickson : > > bq: Why is it possible then to alter replicationFactor via > MODIFYCOLLECTION in the collections API > > Because MODIFYCOLLECTION just changes properties in the collection > definition generically and replicationFactor just happens to be one. > IOW there's no overarching reason. > > It would be extra work to dis-allow that one case and possibly > introduce errors without changing any functionality so nobody was > willing to put in the effort. > > Best, > Erick > >> On Thu, Apr 13, 2017 at 5:48 AM, Shawn Heisey wrote: >>> On 4/13/2017 3:22 AM, Johannes Knaus wrote: >>> Ok. Thank you for your quick reply. Though I still feel a little >>> uneasy. Why is it possible then to alter replicationFactor via >>> MODIFYCOLLECTION in the collections API? What would be the use case >>> for this parameter at all then? >> >> If you use a very specific storage method for your indexes -- HDFS -- >> then replicationFactor has meaning beyond initial collection creation, >> in conjunction with the "autoAddReplicas" feature. >> >> https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS#RunningSolronHDFS-AutomaticallyAddReplicasinSolrCloud >> >> If you are NOT utilizing the very specific HDFS storage engine, then >> everything you were told applies. With standard storage mechanisms, >> replicationFactor has zero meaning after initial collection creation, >> and changing the value will have no effect. >> >> Thanks, >> Shawn >>
AW: What does the replication factor parameter in collections api do?
Ok. Thank you for your quick reply. Though I still feel a little uneasy. Why is it possible then to alter replicationFactor via MODIFYCOLLECTION in the collections API? What would be the use case for this parameter at all then? -Ursprüngliche Nachricht- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Mittwoch, 12. April 2017 19:36 An: solr-user Betreff: Re: What does the replication factor parameter in collections api do? really <3>. replicationFactor is used to set up your collection initially, you have to be able to change your topology afterwards so it's ignored thereafter. Once your replica is added, it's automatically made use of by the collection. On Wed, Apr 12, 2017 at 9:30 AM, Johannes Knaus wrote: > Hi, > > I am still quite new to Solr. I have the following setup: > A SolrCloud setup with > 38 nodes, > maxShardsPerNode=2, > implicit routing with routing field, > and replication factor=2. > > Now, I want to add replica. This works fine by first increasing the > maxShardsPerNode to a higher number and then add replicas. > So far, so good. I can confirm changes of the maxShardsPerNode parameter and > added replicas in the Admin UI. > However, the Solr Admin UI still is showing me a replication factor of 2. > I am a little confused about what the replicationfactor parameter actually > does in my case: > > 1) What does that mean? Does Solr make use of all replicas I have or only of > two? > 2) Do I need to increase the replication factor value as well to really have > more replicas available and usable? If this is true, do I need to > restart/reload the collection newly upload configs to Zookeeper or anything > alike? > 3) Or is replicationfactor just a parameter that is needed for the first > start of SolrCloud and can be ignored afterwards? > > Thank you very much for your help, > All the best, > Johannes >
What does the replication factor parameter in collections api do?
Hi, I am still quite new to Solr. I have the following setup: A SolrCloud setup with 38 nodes, maxShardsPerNode=2, implicit routing with routing field, and replication factor=2. Now, I want to add replica. This works fine by first increasing the maxShardsPerNode to a higher number and then add replicas. So far, so good. I can confirm changes of the maxShardsPerNode parameter and added replicas in the Admin UI. However, the Solr Admin UI still is showing me a replication factor of 2. I am a little confused about what the replicationfactor parameter actually does in my case: 1) What does that mean? Does Solr make use of all replicas I have or only of two? 2) Do I need to increase the replication factor value as well to really have more replicas available and usable? If this is true, do I need to restart/reload the collection newly upload configs to Zookeeper or anything alike? 3) Or is replicationfactor just a parameter that is needed for the first start of SolrCloud and can be ignored afterwards? Thank you very much for your help, All the best, Johannes