Re: SolrCloud indexing -- 2 collections, 2 indexes, sharing the same nodes possible?

2017-08-30 Thread Johannes Knaus
Thank you, Susheel, for the quick response.

So, that means that when I create a new collection, it shards will be newly 
created at each node, right?
Thus, if I have two collections with 
numShards=38, 
maxShardsPerNode=2 and 
replicationFactor=2 
on my 38 nodes, then this would result in each node "hosting" 4 shards (two 
from each collection).

If this is correct, I have two follow up questions:

1) As regards naming of the shards: Is using the same naming for the shards 
o.k. in this constellation? I.e. does it create trouble to have e.g. 
"Shard001", "Shard002", etc. in collection1 and "Shard001", "Shard002", etc. as 
well in collection2?

2) Performance: In my current single collection setup, I have 2 shards per 
node. After creating the second collection, there will be 4 shards per node. Do 
I have to edit the RAM per node value (raise the -m parameter when starting the 
node)? In my case, I am quite sure that the collections will never be queried 
simultaneously. So will the "running but idle" collection slow me down?

Johannes

-Ursprüngliche Nachricht-
Von: Susheel Kumar [mailto:susheel2...@gmail.com] 
Gesendet: Mittwoch, 30. August 2017 17:36
An: solr-user@lucene.apache.org
Betreff: Re: SolrCloud indexing -- 2 collections, 2 indexes, sharing the same 
nodes possible?

Yes, absolutely.  You can create as many as collections you need (like you 
would create table in relational world).

On Wed, Aug 30, 2017 at 10:13 AM, Johannes Knaus  wrote:

> I have a working SolrCloud-Setup with 38 nodes with a collection 
> spanning over these nodes with 2 shards per node and replication 
> factor 2 and a router field.
>
> Now I got some new data for indexing which has the same structure and 
> size as my existing index in the described collection.
> However, although it has the same structure the new data to be indexed 
> should not be mixed with the old data.
>
> Do I have create another 38 new nodes and a new collection and index 
> the new data or is there a better / more efficient way I could use the 
> existing nodes?
> Is it possible that the 2 collections could share the 38 nodes without 
> the indexes being mixed?
>
> Thanks for your help.
>
> Johannes
>


SolrCloud indexing -- 2 collections, 2 indexes, sharing the same nodes possible?

2017-08-30 Thread Johannes Knaus
I have a working SolrCloud-Setup with 38 nodes with a collection spanning over 
these nodes with 2 shards per node and replication factor 2 and a router field.

Now I got some new data for indexing which has the same structure and size as 
my existing index in the described collection.
However, although it has the same structure the new data to be indexed should 
not be mixed with the old data.

Do I have create another 38 new nodes and a new collection and index the new 
data or is there a better / more efficient way I could use the existing nodes?
Is it possible that the 2 collections could share the 38 nodes without the 
indexes being mixed?

Thanks for your help.

Johannes


Sharding vs single index vs separate collection

2017-06-08 Thread Johannes Knaus
Hi,
I have a solr cloud setup, with document routing (implicit routing with router 
field). As the index is about documents with a publication date, I routed 
according the publication year, as in my case, most of the search queries will 
have a year specified.


Now, what would be the best strategy -as regards performance (i.e. a huge 
amount of queries to be processed)- for search queries without any year 
specified? 

1 - Is it enough to define that these queries should go over all routes (i.e. 
route=year1, year2, ..., yearN)?

2 - Would it be better to add a separate node with a separate index that is not 
routed (but maybe sharded/splitted)? If so, how should I deal with such a 
separate index? Is it possible to add it to my existing Solr cloud? Would it go 
into a separate collection?

Thanks for your advice.

Johannes 

Re: AW: What does the replication factor parameter in collections api do?

2017-04-16 Thread Johannes Knaus
Thank you all very much for your answers. That definitely explains it.
All the best,
Johannes

> Am 13.04.2017 um 17:03 schrieb Erick Erickson :
> 
> bq: Why is it possible then to alter replicationFactor via
> MODIFYCOLLECTION in the collections API
> 
> Because MODIFYCOLLECTION just changes properties in the collection
> definition generically and replicationFactor just happens to be one.
> IOW there's no overarching reason.
> 
> It would be extra work to dis-allow that one case and possibly
> introduce errors without changing any functionality so nobody was
> willing to put in the effort.
> 
> Best,
> Erick
> 
>> On Thu, Apr 13, 2017 at 5:48 AM, Shawn Heisey  wrote:
>>> On 4/13/2017 3:22 AM, Johannes Knaus wrote:
>>> Ok. Thank you for your quick reply. Though I still feel a little
>>> uneasy. Why is it possible then to alter replicationFactor via
>>> MODIFYCOLLECTION in the collections API? What would be the use case
>>> for this parameter at all then?
>> 
>> If you use a very specific storage method for your indexes -- HDFS --
>> then replicationFactor has meaning beyond initial collection creation,
>> in conjunction with the "autoAddReplicas" feature.
>> 
>> https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS#RunningSolronHDFS-AutomaticallyAddReplicasinSolrCloud
>> 
>> If you are NOT utilizing the very specific HDFS storage engine, then
>> everything you were told applies.  With standard storage mechanisms,
>> replicationFactor has zero meaning after initial collection creation,
>> and changing the value will have no effect.
>> 
>> Thanks,
>> Shawn
>> 


AW: What does the replication factor parameter in collections api do?

2017-04-13 Thread Johannes Knaus
Ok. Thank you for your quick reply. 
Though I still feel a little uneasy. Why is it possible then to alter 
replicationFactor via MODIFYCOLLECTION in the collections API? What would be 
the use case for this parameter at all then?


-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com] 
Gesendet: Mittwoch, 12. April 2017 19:36
An: solr-user
Betreff: Re: What does the replication factor parameter in collections api do?

really <3>. replicationFactor is used to set up your collection initially, you 
have to be able to change your topology afterwards so it's ignored thereafter.

Once your replica is added, it's automatically made use of by the collection.

On Wed, Apr 12, 2017 at 9:30 AM, Johannes Knaus  wrote:
> Hi,
>
> I am still quite new to Solr. I have the following setup:
> A SolrCloud setup with
> 38 nodes,
> maxShardsPerNode=2,
> implicit routing with routing field,
> and replication factor=2.
>
> Now, I want to add replica. This works fine by first increasing the 
> maxShardsPerNode to a higher number and then add replicas.
> So far, so good. I can confirm changes of the maxShardsPerNode parameter and 
> added replicas in the Admin UI.
> However, the Solr Admin UI still is showing me a replication factor of 2.
> I am a little confused about what the replicationfactor parameter actually 
> does in my case:
>
> 1) What does that mean? Does Solr make use of all replicas I have or only of 
> two?
> 2) Do I need to increase the replication factor value as well to really have 
> more replicas available and usable? If this is true, do I need to 
> restart/reload the collection newly upload configs to Zookeeper or anything 
> alike?
> 3) Or is replicationfactor just a parameter that is needed for the first 
> start of SolrCloud and can be ignored afterwards?
>
> Thank you very much for your help,
> All the best,
> Johannes
>


What does the replication factor parameter in collections api do?

2017-04-12 Thread Johannes Knaus
Hi,

I am still quite new to Solr. I have the following setup:
A SolrCloud setup with 
38 nodes, 
maxShardsPerNode=2, 
implicit routing with routing field, 
and replication factor=2.

Now, I want to add replica. This works fine by first increasing the 
maxShardsPerNode to a higher number and then add replicas.
So far, so good. I can confirm changes of the maxShardsPerNode parameter and 
added replicas in the Admin UI.
However, the Solr Admin UI still is showing me a replication factor of 2.
I am a little confused about what the replicationfactor parameter actually does 
in my case:

1) What does that mean? Does Solr make use of all replicas I have or only of 
two?
2) Do I need to increase the replication factor value as well to really have 
more replicas available and usable? If this is true, do I need to 
restart/reload the collection newly upload configs to Zookeeper or anything 
alike?
3) Or is replicationfactor just a parameter that is needed for the first start 
of SolrCloud and can be ignored afterwards?

Thank you very much for your help,
All the best,
Johannes