Re: CREATE collection bug or feature?

2015-06-19 Thread Shawn Heisey
On 6/19/2015 11:15 AM, Jim.Musil wrote:
 I noticed that when I issue the CREATE collection command to the api, it does 
 not automatically put a replica on every live node connected to zookeeper.

 So, for example, if I have 3 solr nodes connected to a zookeeper ensemble and 
 create a collection like this:

 /admin/collections?action=CREATEname=my_collectionnumShards=1replicationFactor=1maxShardsPerNode=1collection.configName=my_config

 It will only create a core on one of the three nodes. I can make it work if I 
 change replicationFactor to 3. When standing up an entire stack using chef, 
 this all gets a bit clunky. I don't see any option such as ALL that would 
 just create a replica on all nodes regardless of size.

 I'm guessing this is intentional, but curious about the reasoning.

If you tell it replicationFactor=1, then you get exactly that -- one
copy of your index.  I personally think that it would be a violation of
something known as the principle of least surprise for Solr to
automatically create replicas without being asked to.

I would assume that if you are writing automated tools to build indexes
and the servers hosting those indexes that your automation will be able
to calculate a reasonable replicationFactor, or calculate the number of
hosts to create based on a provided replicationFactor.

A feature to have Solr itself automatically calculate a
replicationFactor based on the number of available hosts and the
numShards value provided is not a bad idea.  Please create a feature
request issue in Jira.  One way that this might be done is by setting
replicationFactor to auto or maybe a special number, perhaps 0 or -1.

https://issues.apache.org/jira/browse/SOLR

Thanks,
Shawn



Re: CREATE collection bug or feature?

2015-06-19 Thread Erick Erickson
Jim:

This is by design. There's no way to tell Solr to find all the cores
available and put one replica on each. In fact, you're explicitly
telling it to create one and only one replica, one and only one shard.
That is, your collection will have exactly one low-level core. But you
realized that...

As to the reasoning. Consider hetergeneous collections all hosted on
the same Solr cluster. I have big collections, little collections,
some with high QPS rates, some not. etc. Having Solr do things like
this automatically would make managing this difficult.

Probably the real reason is nobody thought it would be useful in
the general case. And I probably concur. Adding a new node to an
existing cluster would result in unbalanced clusters etc.

I suppose a stop-gap would be to query the live_nodes in the cluster
and add that to the URL, don't know how much of a pain that would be
though.

Best,
Erick

On Fri, Jun 19, 2015 at 10:15 AM, Jim.Musil jim.mu...@target.com wrote:
 I noticed that when I issue the CREATE collection command to the api, it does 
 not automatically put a replica on every live node connected to zookeeper.

 So, for example, if I have 3 solr nodes connected to a zookeeper ensemble and 
 create a collection like this:

 /admin/collections?action=CREATEname=my_collectionnumShards=1replicationFactor=1maxShardsPerNode=1collection.configName=my_config

 It will only create a core on one of the three nodes. I can make it work if I 
 change replicationFactor to 3. When standing up an entire stack using chef, 
 this all gets a bit clunky. I don't see any option such as ALL that would 
 just create a replica on all nodes regardless of size.

 I'm guessing this is intentional, but curious about the reasoning.

 Thanks!
 Jim


CREATE collection bug or feature?

2015-06-19 Thread Jim . Musil
I noticed that when I issue the CREATE collection command to the api, it does 
not automatically put a replica on every live node connected to zookeeper.

So, for example, if I have 3 solr nodes connected to a zookeeper ensemble and 
create a collection like this:

/admin/collections?action=CREATEname=my_collectionnumShards=1replicationFactor=1maxShardsPerNode=1collection.configName=my_config

It will only create a core on one of the three nodes. I can make it work if I 
change replicationFactor to 3. When standing up an entire stack using chef, 
this all gets a bit clunky. I don't see any option such as ALL that would 
just create a replica on all nodes regardless of size.

I'm guessing this is intentional, but curious about the reasoning.

Thanks!
Jim


Re: CREATE collection bug or feature?

2015-06-19 Thread Jim . Musil
Thanks as always for the great answers!

Jim


On 6/19/15, 11:57 AM, Erick Erickson erickerick...@gmail.com wrote:

Jim:

This is by design. There's no way to tell Solr to find all the cores
available and put one replica on each. In fact, you're explicitly
telling it to create one and only one replica, one and only one shard.
That is, your collection will have exactly one low-level core. But you
realized that...

As to the reasoning. Consider hetergeneous collections all hosted on
the same Solr cluster. I have big collections, little collections,
some with high QPS rates, some not. etc. Having Solr do things like
this automatically would make managing this difficult.

Probably the real reason is nobody thought it would be useful in
the general case. And I probably concur. Adding a new node to an
existing cluster would result in unbalanced clusters etc.

I suppose a stop-gap would be to query the live_nodes in the cluster
and add that to the URL, don't know how much of a pain that would be
though.

Best,
Erick

On Fri, Jun 19, 2015 at 10:15 AM, Jim.Musil jim.mu...@target.com wrote:
 I noticed that when I issue the CREATE collection command to the api,
it does not automatically put a replica on every live node connected to
zookeeper.

 So, for example, if I have 3 solr nodes connected to a zookeeper
ensemble and create a collection like this:

 
/admin/collections?action=CREATEname=my_collectionnumShards=1replicati
onFactor=1maxShardsPerNode=1collection.configName=my_config

 It will only create a core on one of the three nodes. I can make it
work if I change replicationFactor to 3. When standing up an entire
stack using chef, this all gets a bit clunky. I don't see any option
such as ALL that would just create a replica on all nodes regardless
of size.

 I'm guessing this is intentional, but curious about the reasoning.

 Thanks!
 Jim