Re: SolrCloud shard distribution with Collections API

2014-11-06 Thread ralph tice
I've had a bad enough experience with the default shard placement that I
create a collection with one shard, add the shards where I want them, then
use add/delete replica to move the first one to the right machine/port.

Typically this is in a SolrCloud of dozens or hundreds of shards.  Our
shards are all partitioned by time so there are big performance advantages
to optimal placement across JVMs and machines.

What sort of situation do you not have trouble with default shard placement?


On Wed, Nov 5, 2014 at 5:10 PM, Erick Erickson erickerick...@gmail.com wrote:
 They should be pretty well distributed by default, but if you want to
 take manual control, you can use the createNodeSet param on CREATE
 (with replication factor of 1) and then ADDREPLICA with the node param
 to put replicas for shards exactly where you want.

 Best,
 Erick

 On Wed, Nov 5, 2014 at 2:12 PM, CTO직속IsabellePhan ip...@coupang.com wrote:
 Hello,

 I am testing a small SolrCloud cluster on 2 servers. I started 2 nodes on
 each server, so that each collection can have 2 shards with replication
 factor of 2.

 I am using below command from Collections API to create collection:

 curl '
 http://serveraddress/solr/admin/collections?action=CREATEname=cp_collectionnumShards=2replicationFactor=2collection.configName=cp_config
 '

 Is there a way to ensure that for each shard, leader and replica are on a
 different server?
 This command sometimes put them on 2 nodes from the same server.


 Thanks a lot for your help,

 Isabelle


Re: SolrCloud shard distribution with Collections API

2014-11-06 Thread CTO직속IsabellePhan
When using Collections API CREATE action, I found that sometimes default
shard placement is correct (leader and replica on different servers) and
sometimes not. So I was looking for a simple and reliable way to ensure
better placement.
It seems like I will have to do it manually for best control, as
recommended by Erick and you.

Thanks,

Isabelle


PS: I deleted emails from thread history, because my reply keeps being
rejected by apache server as spam...


On Thu, Nov 6, 2014 at 8:13 AM, ralph tice ralph.t...@gmail.com wrote:

 I've had a bad enough experience with the default shard placement that I
 create a collection with one shard, add the shards where I want them, then
 use add/delete replica to move the first one to the right machine/port.

 Typically this is in a SolrCloud of dozens or hundreds of shards.  Our
 shards are all partitioned by time so there are big performance advantages
 to optimal placement across JVMs and machines.

 What sort of situation do you not have trouble with default shard
 placement?


 On Wed, Nov 5, 2014 at 5:10 PM, Erick Erickson erickerick...@gmail.com
 wrote:
  They should be pretty well distributed by default, but if you want to
  take manual control, you can use the createNodeSet param on CREATE
  (with replication factor of 1) and then ADDREPLICA with the node param
  to put replicas for shards exactly where you want.
 
  Best,
  Erick
 




Re: SolrCloud shard distribution with Collections API

2014-11-05 Thread Erick Erickson
They should be pretty well distributed by default, but if you want to
take manual control, you can use the createNodeSet param on CREATE
(with replication factor of 1) and then ADDREPLICA with the node param
to put replicas for shards exactly where you want.

Best,
Erick

On Wed, Nov 5, 2014 at 2:12 PM, CTO직속IsabellePhan ip...@coupang.com wrote:
 Hello,

 I am testing a small SolrCloud cluster on 2 servers. I started 2 nodes on
 each server, so that each collection can have 2 shards with replication
 factor of 2.

 I am using below command from Collections API to create collection:

 curl '
 http://serveraddress/solr/admin/collections?action=CREATEname=cp_collectionnumShards=2replicationFactor=2collection.configName=cp_config
 '

 Is there a way to ensure that for each shard, leader and replica are on a
 different server?
 This command sometimes put them on 2 nodes from the same server.


 Thanks a lot for your help,

 Isabelle


Re: SolrCloud shard distribution with Collections API

2014-11-05 Thread CTO직속IsabellePhan
Thanks for the advice Erick.

Would you know what the underlying logic doing the shard distribution is?
Does it depend on the order in which each node joined the cluster or does
the collections api logic actually checks the node host IP to ensure even
distribution?

Best Regards,

Isabelle


On Wed, Nov 5, 2014 at 3:10 PM, Erick Erickson erickerick...@gmail.com
wrote:

 They should be pretty well distributed by default, but if you want to
 take manual control, you can use the createNodeSet param on CREATE
 (with replication factor of 1) and then ADDREPLICA with the node param
 to put replicas for shards exactly where you want.

 Best,
 Erick



Re: SolrCloud shard distribution with Collections API

2014-11-05 Thread Lee Chunki
Hi Isabelle,

If I understood correctly your question, you can check shard distribution 
status at admin page 
http://localhost:8983/solr/#/~cloud 

if you started solr by using command like 
$ java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar
( 
https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud 
)

then the nodes are assigned with the order
1-shard leader - 2-shard leader - 1-shard replica - 2-shard replica

of course, form the second time, the node’s status does not change.

Best,
Chunki


 On Nov 6, 2014, at 2:46 PM, CTO직속IsabellePhan ip...@coupang.com wrote:
 
 Thanks for the advice Erick.
 
 Would you know what the underlying logic doing the shard distribution is?
 Does it depend on the order in which each node joined the cluster or does
 the collections api logic actually checks the node host IP to ensure even
 distribution?
 
 Best Regards,
 
 Isabelle
 
 
 On Wed, Nov 5, 2014 at 3:10 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
 They should be pretty well distributed by default, but if you want to
 take manual control, you can use the createNodeSet param on CREATE
 (with replication factor of 1) and then ADDREPLICA with the node param
 to put replicas for shards exactly where you want.
 
 Best,
 Erick
 



Re: SolrCloud - shard distribution

2013-01-09 Thread Mark Miller
I just tried this. I started 6 nodes with collection1 spread across two shards. 
Looked at the admin-cloud-graph view and everything looked right and green.

Next, I copy and pasted your command and refreshed the graph cloud view.

I see a new collection called consumer1 - all of it's nodes are green and the 
collection consists of 3 shards. Each shard has 1 leader and 1 replica, each 
hosted by a different Solr instance.

In other words, it seemed to work for me.

- Mark

On Jan 9, 2013, at 10:58 AM, James Thomas jtho...@camstar.com wrote:

 Hi,
 
 Simple question, I hope.
 
 Using the nightly build of 4.1 from yesterday (Jan 8, 2013), I started 6 Solr 
 nodes.
 I issued the following command to create a collection with 3 shards, and a 
 replication factor=2.  So a total of 6 shards.
 curl 
 'http://localhost:11000/solr/admin/collections?action=CREATEname=consumer1numShards=3replicationFactor=2'
 The end result was the following shard distribution:
shard1 - node #13, #15  (with #13 as leader)
   shard2 - node #15, #16  (with #15 as leader)
   shard3 - node #11, #16  (with #11 as leader)
 
 Since I am using the default value of 1 for 'maxShardsPerNode', I was 
 surprised to see that Solr created two shards on instance #16.  I expected 
 that each Solr node (there are 6) would each be assigned one shard from the 
 collection.  Is this a bug or expected behavior?
 
 Thanks,
 James



RE: SolrCloud - shard distribution

2013-01-09 Thread James Thomas
Thanks for the quick reply Mark.
I tried all kinds of variations, I could not get all 6 nodes to participate.
So I downloaded the source code and took a look at 
OverseerCollectionProcessor.java
I think my result is as-coded.

Line 251 has this loop:
  for (int i = 1; i = numSlices; i++) {
for (int j = 1; j = repFactor; j++) {
  String nodeName = nodeList.get(((i - 1) + (j - 1)) % nodeList.size());

So for my inputs, numSlices=3 and repFactor=2.
And the logic here will choose the same node for these two slices:
--- slice1, rep2 (i=2,j=1)  == chooses node[1]
--- slice2, rep1 (i=1,j=2)  == chooses node[1]

BTW, I did notice the comment in the code:
  // we need to look at every node and see how many cores it serves
  // add our new cores to existing nodes serving the least number of cores
  // but (for now) require that each core goes on a distinct node.
  
  // TODO: add smarter options that look at the current number of cores per
  // node?
  // for now we just go random

Thanks,
James

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Wednesday, January 09, 2013 11:30 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud - shard distribution

I just tried this. I started 6 nodes with collection1 spread across two shards. 
Looked at the admin-cloud-graph view and everything looked right and green.

Next, I copy and pasted your command and refreshed the graph cloud view.

I see a new collection called consumer1 - all of it's nodes are green and the 
collection consists of 3 shards. Each shard has 1 leader and 1 replica, each 
hosted by a different Solr instance.

In other words, it seemed to work for me.

- Mark

On Jan 9, 2013, at 10:58 AM, James Thomas jtho...@camstar.com wrote:

 Hi,
 
 Simple question, I hope.
 
 Using the nightly build of 4.1 from yesterday (Jan 8, 2013), I started 6 Solr 
 nodes.
 I issued the following command to create a collection with 3 shards, and a 
 replication factor=2.  So a total of 6 shards.
 curl 
 'http://localhost:11000/solr/admin/collections?action=CREATEname=consumer1numShards=3replicationFactor=2'
 The end result was the following shard distribution:
shard1 - node #13, #15  (with #13 as leader)
   shard2 - node #15, #16  (with #15 as leader)
   shard3 - node #11, #16  (with #11 as leader)
 
 Since I am using the default value of 1 for 'maxShardsPerNode', I was 
 surprised to see that Solr created two shards on instance #16.  I expected 
 that each Solr node (there are 6) would each be assigned one shard from the 
 collection.  Is this a bug or expected behavior?
 
 Thanks,
 James





RE: SolrCloud - shard distribution

2013-01-09 Thread James Thomas
Oops, small copy-paste error.  Had my i's and j's backwards.
Should be:
--- slice1, rep2 (i=1,j=2)  == chooses node[1]
--- slice2, rep1 (i=2,j=1)  == chooses node[1]

-Original Message-
From: James Thomas [mailto:jtho...@camstar.com] 
Sent: Wednesday, January 09, 2013 1:39 PM
To: solr-user@lucene.apache.org
Subject: RE: SolrCloud - shard distribution

Thanks for the quick reply Mark.
I tried all kinds of variations, I could not get all 6 nodes to participate.
So I downloaded the source code and took a look at 
OverseerCollectionProcessor.java I think my result is as-coded.

Line 251 has this loop:
  for (int i = 1; i = numSlices; i++) {
for (int j = 1; j = repFactor; j++) {
  String nodeName = nodeList.get(((i - 1) + (j - 1)) % nodeList.size());

So for my inputs, numSlices=3 and repFactor=2.
And the logic here will choose the same node for these two slices:
--- slice1, rep2 (i=2,j=1)  == chooses node[1]
--- slice2, rep1 (i=1,j=2)  == chooses node[1]

BTW, I did notice the comment in the code:
  // we need to look at every node and see how many cores it serves
  // add our new cores to existing nodes serving the least number of cores
  // but (for now) require that each core goes on a distinct node.
  
  // TODO: add smarter options that look at the current number of cores per
  // node?
  // for now we just go random

Thanks,
James

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: Wednesday, January 09, 2013 11:30 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud - shard distribution

I just tried this. I started 6 nodes with collection1 spread across two shards. 
Looked at the admin-cloud-graph view and everything looked right and green.

Next, I copy and pasted your command and refreshed the graph cloud view.

I see a new collection called consumer1 - all of it's nodes are green and the 
collection consists of 3 shards. Each shard has 1 leader and 1 replica, each 
hosted by a different Solr instance.

In other words, it seemed to work for me.

- Mark

On Jan 9, 2013, at 10:58 AM, James Thomas jtho...@camstar.com wrote:

 Hi,
 
 Simple question, I hope.
 
 Using the nightly build of 4.1 from yesterday (Jan 8, 2013), I started 6 Solr 
 nodes.
 I issued the following command to create a collection with 3 shards, and a 
 replication factor=2.  So a total of 6 shards.
 curl 
 'http://localhost:11000/solr/admin/collections?action=CREATEname=consumer1numShards=3replicationFactor=2'
 The end result was the following shard distribution:
shard1 - node #13, #15  (with #13 as leader)
   shard2 - node #15, #16  (with #15 as leader)
   shard3 - node #11, #16  (with #11 as leader)
 
 Since I am using the default value of 1 for 'maxShardsPerNode', I was 
 surprised to see that Solr created two shards on instance #16.  I expected 
 that each Solr node (there are 6) would each be assigned one shard from the 
 collection.  Is this a bug or expected behavior?
 
 Thanks,
 James