Re: SolrCloud shard distribution with Collections API
I've had a bad enough experience with the default shard placement that I create a collection with one shard, add the shards where I want them, then use add/delete replica to move the first one to the right machine/port. Typically this is in a SolrCloud of dozens or hundreds of shards. Our shards are all partitioned by time so there are big performance advantages to optimal placement across JVMs and machines. What sort of situation do you not have trouble with default shard placement? On Wed, Nov 5, 2014 at 5:10 PM, Erick Erickson erickerick...@gmail.com wrote: They should be pretty well distributed by default, but if you want to take manual control, you can use the createNodeSet param on CREATE (with replication factor of 1) and then ADDREPLICA with the node param to put replicas for shards exactly where you want. Best, Erick On Wed, Nov 5, 2014 at 2:12 PM, CTO직속IsabellePhan ip...@coupang.com wrote: Hello, I am testing a small SolrCloud cluster on 2 servers. I started 2 nodes on each server, so that each collection can have 2 shards with replication factor of 2. I am using below command from Collections API to create collection: curl ' http://serveraddress/solr/admin/collections?action=CREATEname=cp_collectionnumShards=2replicationFactor=2collection.configName=cp_config ' Is there a way to ensure that for each shard, leader and replica are on a different server? This command sometimes put them on 2 nodes from the same server. Thanks a lot for your help, Isabelle
Re: SolrCloud shard distribution with Collections API
When using Collections API CREATE action, I found that sometimes default shard placement is correct (leader and replica on different servers) and sometimes not. So I was looking for a simple and reliable way to ensure better placement. It seems like I will have to do it manually for best control, as recommended by Erick and you. Thanks, Isabelle PS: I deleted emails from thread history, because my reply keeps being rejected by apache server as spam... On Thu, Nov 6, 2014 at 8:13 AM, ralph tice ralph.t...@gmail.com wrote: I've had a bad enough experience with the default shard placement that I create a collection with one shard, add the shards where I want them, then use add/delete replica to move the first one to the right machine/port. Typically this is in a SolrCloud of dozens or hundreds of shards. Our shards are all partitioned by time so there are big performance advantages to optimal placement across JVMs and machines. What sort of situation do you not have trouble with default shard placement? On Wed, Nov 5, 2014 at 5:10 PM, Erick Erickson erickerick...@gmail.com wrote: They should be pretty well distributed by default, but if you want to take manual control, you can use the createNodeSet param on CREATE (with replication factor of 1) and then ADDREPLICA with the node param to put replicas for shards exactly where you want. Best, Erick
Re: SolrCloud shard distribution with Collections API
They should be pretty well distributed by default, but if you want to take manual control, you can use the createNodeSet param on CREATE (with replication factor of 1) and then ADDREPLICA with the node param to put replicas for shards exactly where you want. Best, Erick On Wed, Nov 5, 2014 at 2:12 PM, CTO직속IsabellePhan ip...@coupang.com wrote: Hello, I am testing a small SolrCloud cluster on 2 servers. I started 2 nodes on each server, so that each collection can have 2 shards with replication factor of 2. I am using below command from Collections API to create collection: curl ' http://serveraddress/solr/admin/collections?action=CREATEname=cp_collectionnumShards=2replicationFactor=2collection.configName=cp_config ' Is there a way to ensure that for each shard, leader and replica are on a different server? This command sometimes put them on 2 nodes from the same server. Thanks a lot for your help, Isabelle
Re: SolrCloud shard distribution with Collections API
Thanks for the advice Erick. Would you know what the underlying logic doing the shard distribution is? Does it depend on the order in which each node joined the cluster or does the collections api logic actually checks the node host IP to ensure even distribution? Best Regards, Isabelle On Wed, Nov 5, 2014 at 3:10 PM, Erick Erickson erickerick...@gmail.com wrote: They should be pretty well distributed by default, but if you want to take manual control, you can use the createNodeSet param on CREATE (with replication factor of 1) and then ADDREPLICA with the node param to put replicas for shards exactly where you want. Best, Erick
Re: SolrCloud shard distribution with Collections API
Hi Isabelle, If I understood correctly your question, you can check shard distribution status at admin page http://localhost:8983/solr/#/~cloud if you started solr by using command like $ java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar ( https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud ) then the nodes are assigned with the order 1-shard leader - 2-shard leader - 1-shard replica - 2-shard replica of course, form the second time, the node’s status does not change. Best, Chunki On Nov 6, 2014, at 2:46 PM, CTO직속IsabellePhan ip...@coupang.com wrote: Thanks for the advice Erick. Would you know what the underlying logic doing the shard distribution is? Does it depend on the order in which each node joined the cluster or does the collections api logic actually checks the node host IP to ensure even distribution? Best Regards, Isabelle On Wed, Nov 5, 2014 at 3:10 PM, Erick Erickson erickerick...@gmail.com wrote: They should be pretty well distributed by default, but if you want to take manual control, you can use the createNodeSet param on CREATE (with replication factor of 1) and then ADDREPLICA with the node param to put replicas for shards exactly where you want. Best, Erick
Re: SolrCloud - shard distribution
I just tried this. I started 6 nodes with collection1 spread across two shards. Looked at the admin-cloud-graph view and everything looked right and green. Next, I copy and pasted your command and refreshed the graph cloud view. I see a new collection called consumer1 - all of it's nodes are green and the collection consists of 3 shards. Each shard has 1 leader and 1 replica, each hosted by a different Solr instance. In other words, it seemed to work for me. - Mark On Jan 9, 2013, at 10:58 AM, James Thomas jtho...@camstar.com wrote: Hi, Simple question, I hope. Using the nightly build of 4.1 from yesterday (Jan 8, 2013), I started 6 Solr nodes. I issued the following command to create a collection with 3 shards, and a replication factor=2. So a total of 6 shards. curl 'http://localhost:11000/solr/admin/collections?action=CREATEname=consumer1numShards=3replicationFactor=2' The end result was the following shard distribution: shard1 - node #13, #15 (with #13 as leader) shard2 - node #15, #16 (with #15 as leader) shard3 - node #11, #16 (with #11 as leader) Since I am using the default value of 1 for 'maxShardsPerNode', I was surprised to see that Solr created two shards on instance #16. I expected that each Solr node (there are 6) would each be assigned one shard from the collection. Is this a bug or expected behavior? Thanks, James
RE: SolrCloud - shard distribution
Thanks for the quick reply Mark. I tried all kinds of variations, I could not get all 6 nodes to participate. So I downloaded the source code and took a look at OverseerCollectionProcessor.java I think my result is as-coded. Line 251 has this loop: for (int i = 1; i = numSlices; i++) { for (int j = 1; j = repFactor; j++) { String nodeName = nodeList.get(((i - 1) + (j - 1)) % nodeList.size()); So for my inputs, numSlices=3 and repFactor=2. And the logic here will choose the same node for these two slices: --- slice1, rep2 (i=2,j=1) == chooses node[1] --- slice2, rep1 (i=1,j=2) == chooses node[1] BTW, I did notice the comment in the code: // we need to look at every node and see how many cores it serves // add our new cores to existing nodes serving the least number of cores // but (for now) require that each core goes on a distinct node. // TODO: add smarter options that look at the current number of cores per // node? // for now we just go random Thanks, James -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Wednesday, January 09, 2013 11:30 AM To: solr-user@lucene.apache.org Subject: Re: SolrCloud - shard distribution I just tried this. I started 6 nodes with collection1 spread across two shards. Looked at the admin-cloud-graph view and everything looked right and green. Next, I copy and pasted your command and refreshed the graph cloud view. I see a new collection called consumer1 - all of it's nodes are green and the collection consists of 3 shards. Each shard has 1 leader and 1 replica, each hosted by a different Solr instance. In other words, it seemed to work for me. - Mark On Jan 9, 2013, at 10:58 AM, James Thomas jtho...@camstar.com wrote: Hi, Simple question, I hope. Using the nightly build of 4.1 from yesterday (Jan 8, 2013), I started 6 Solr nodes. I issued the following command to create a collection with 3 shards, and a replication factor=2. So a total of 6 shards. curl 'http://localhost:11000/solr/admin/collections?action=CREATEname=consumer1numShards=3replicationFactor=2' The end result was the following shard distribution: shard1 - node #13, #15 (with #13 as leader) shard2 - node #15, #16 (with #15 as leader) shard3 - node #11, #16 (with #11 as leader) Since I am using the default value of 1 for 'maxShardsPerNode', I was surprised to see that Solr created two shards on instance #16. I expected that each Solr node (there are 6) would each be assigned one shard from the collection. Is this a bug or expected behavior? Thanks, James
RE: SolrCloud - shard distribution
Oops, small copy-paste error. Had my i's and j's backwards. Should be: --- slice1, rep2 (i=1,j=2) == chooses node[1] --- slice2, rep1 (i=2,j=1) == chooses node[1] -Original Message- From: James Thomas [mailto:jtho...@camstar.com] Sent: Wednesday, January 09, 2013 1:39 PM To: solr-user@lucene.apache.org Subject: RE: SolrCloud - shard distribution Thanks for the quick reply Mark. I tried all kinds of variations, I could not get all 6 nodes to participate. So I downloaded the source code and took a look at OverseerCollectionProcessor.java I think my result is as-coded. Line 251 has this loop: for (int i = 1; i = numSlices; i++) { for (int j = 1; j = repFactor; j++) { String nodeName = nodeList.get(((i - 1) + (j - 1)) % nodeList.size()); So for my inputs, numSlices=3 and repFactor=2. And the logic here will choose the same node for these two slices: --- slice1, rep2 (i=2,j=1) == chooses node[1] --- slice2, rep1 (i=1,j=2) == chooses node[1] BTW, I did notice the comment in the code: // we need to look at every node and see how many cores it serves // add our new cores to existing nodes serving the least number of cores // but (for now) require that each core goes on a distinct node. // TODO: add smarter options that look at the current number of cores per // node? // for now we just go random Thanks, James -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Wednesday, January 09, 2013 11:30 AM To: solr-user@lucene.apache.org Subject: Re: SolrCloud - shard distribution I just tried this. I started 6 nodes with collection1 spread across two shards. Looked at the admin-cloud-graph view and everything looked right and green. Next, I copy and pasted your command and refreshed the graph cloud view. I see a new collection called consumer1 - all of it's nodes are green and the collection consists of 3 shards. Each shard has 1 leader and 1 replica, each hosted by a different Solr instance. In other words, it seemed to work for me. - Mark On Jan 9, 2013, at 10:58 AM, James Thomas jtho...@camstar.com wrote: Hi, Simple question, I hope. Using the nightly build of 4.1 from yesterday (Jan 8, 2013), I started 6 Solr nodes. I issued the following command to create a collection with 3 shards, and a replication factor=2. So a total of 6 shards. curl 'http://localhost:11000/solr/admin/collections?action=CREATEname=consumer1numShards=3replicationFactor=2' The end result was the following shard distribution: shard1 - node #13, #15 (with #13 as leader) shard2 - node #15, #16 (with #15 as leader) shard3 - node #11, #16 (with #11 as leader) Since I am using the default value of 1 for 'maxShardsPerNode', I was surprised to see that Solr created two shards on instance #16. I expected that each Solr node (there are 6) would each be assigned one shard from the collection. Is this a bug or expected behavior? Thanks, James