Re: SolrCloud leaders using more disk space

2014-06-30 Thread Greg Pendlebury
Thanks for the reply Tim.

 Can you diff the listings of the index data directories on a leader vs.
replica?

It was a good tip, and mirrors some stuff we have been exploring in house
as well. The leaders all have additional 'index.' directories on disk,
but we have come to the conclusion that this is a coincidence and not
related to the fact that they are leaders.

Current theory is that they are the result of an upgrade rehearsal that was
performed before launch where the cluster was split into two on different
versions of Solr and different ZK paths. I suspect that whilst the ops team
where doing the deployment there were a number of server restarts that
triggered leader elections and recovery events that weren't allowed to
complete gracefully, leaving the old data on disk.

The coincidence is simply that the ops team did all their initial practice
stuff on the same 3 hosts, which later became our leaders. I've found a few
small similar issue on hosts 4-6, and none at all on hosts 7-9.

I hoping we get a chance to test all this soon, but we need to re-jig our
test systems first, since they don't have any redundancy depth to them
right now.

Ta,
Greg


On 28 June 2014 02:59, Timothy Potter thelabd...@gmail.com wrote:

 Hi Greg,

 Sorry for the slow response. The general thinking is that you
 shouldn't worry about which nodes host leaders vs. replicas because A)
 that can change, and B) as you say, the additional responsibilities
 for leader nodes is quite minimal (mainly per-doc version management
 and then distributing updates to replicas). The segment merging all
 happens at the Lucene level, which has no knowledge of SolrCloud
 leaders / replicas. Since this is SolrCloud, all nodes pull the config
 from ZooKeeper so should be running the same settings. Can you diff
 the listings of the index data directories on a leader vs. replica?
 Might give us some insights to what files the leader has that the
 replicas don't have.

 Cheers,
 Tim

 On Tue, Jun 3, 2014 at 8:32 PM, Greg Pendlebury
 greg.pendleb...@gmail.com wrote:
  Hi all,
 
  We launched our new production instance of SolrCloud last week and since
  then have noticed a trend with regards to disk usage. The non-leader
  replicas all seem to be self-optimizing their index segments as expected,
  but the leaders have (on average) around 33% more data on disk. My
  assumption is that leader's are not self-optimising (or not to the same
  extent)... but it is still early days of course.
 
  If it helps, there are 45 JVMs in the cloud, with 15 shards and 3
 replicas
  per shard. Each non-leader shard is sitting at between 59GB and 87GB on
  their SSD, but the leaders are between 84GB and 116GB.
 
  We have pretty much constant read and write traffic 24x7, with just
 'slow'
  periods overnight when write traffic is  1 document per second and
  searches are between 1 and 2 per second. Is this light level of traffic
  still too much for the leaders to self-optimise?
 
  I'd also be curious to hear about what others are doing in terms of
  operating procedures. We load test before launch what would happen if we
  turned off JVMs and forced recovery events. I know that these things all
  work, just that customers will experience slower search responses whilst
  they occur. For example, a restore from a leader to a replica under load
  testing for us takes around 30 minutes and response times drop from
 around
  200-300ms average to 1.5s average.
 
  Bottleneck appears to be network I/O on the servers. We haven't explored
  whether this is specific to the servers replicating, or saturation of the
  of the infrastructure that all the servers share, because...
 
  This performance is acceptable for us, but I'm not sure if I'd like to
  force that event to occur unless required... this is following the line
 of
  reasoning proposed internally that we should periodically rotate leaders
 by
  turning them off briefly. We aren't going to do that unless we have a
  strong reason though. Does anyone try to manipulate production instances
  that way?
 
  Vaguely related to this is leader distribution. We have 9 physical
 servers
  and 5 JVMs running on each server. By virtue of the deployment procedures
  the first 3 servers to come online are all running 5 leaders each. Is
 there
  any merit in 'moving' these around (by reboots)?
 
  Our planning up to launch was based on lots of mailing list response we'd
  seen that indicated leaders had no significant performance difference to
  normal replicas, and all of our testing has agreed with that. The disk
 size
  'issue' (which we aren't worried about... yet. It hasn't been in prod
 long
  enough to know for certain) may be the only thing we've seen so far.
 
  Ta,
  Greg



Re: SolrCloud leaders using more disk space

2014-06-27 Thread Timothy Potter
Hi Greg,

Sorry for the slow response. The general thinking is that you
shouldn't worry about which nodes host leaders vs. replicas because A)
that can change, and B) as you say, the additional responsibilities
for leader nodes is quite minimal (mainly per-doc version management
and then distributing updates to replicas). The segment merging all
happens at the Lucene level, which has no knowledge of SolrCloud
leaders / replicas. Since this is SolrCloud, all nodes pull the config
from ZooKeeper so should be running the same settings. Can you diff
the listings of the index data directories on a leader vs. replica?
Might give us some insights to what files the leader has that the
replicas don't have.

Cheers,
Tim

On Tue, Jun 3, 2014 at 8:32 PM, Greg Pendlebury
greg.pendleb...@gmail.com wrote:
 Hi all,

 We launched our new production instance of SolrCloud last week and since
 then have noticed a trend with regards to disk usage. The non-leader
 replicas all seem to be self-optimizing their index segments as expected,
 but the leaders have (on average) around 33% more data on disk. My
 assumption is that leader's are not self-optimising (or not to the same
 extent)... but it is still early days of course.

 If it helps, there are 45 JVMs in the cloud, with 15 shards and 3 replicas
 per shard. Each non-leader shard is sitting at between 59GB and 87GB on
 their SSD, but the leaders are between 84GB and 116GB.

 We have pretty much constant read and write traffic 24x7, with just 'slow'
 periods overnight when write traffic is  1 document per second and
 searches are between 1 and 2 per second. Is this light level of traffic
 still too much for the leaders to self-optimise?

 I'd also be curious to hear about what others are doing in terms of
 operating procedures. We load test before launch what would happen if we
 turned off JVMs and forced recovery events. I know that these things all
 work, just that customers will experience slower search responses whilst
 they occur. For example, a restore from a leader to a replica under load
 testing for us takes around 30 minutes and response times drop from around
 200-300ms average to 1.5s average.

 Bottleneck appears to be network I/O on the servers. We haven't explored
 whether this is specific to the servers replicating, or saturation of the
 of the infrastructure that all the servers share, because...

 This performance is acceptable for us, but I'm not sure if I'd like to
 force that event to occur unless required... this is following the line of
 reasoning proposed internally that we should periodically rotate leaders by
 turning them off briefly. We aren't going to do that unless we have a
 strong reason though. Does anyone try to manipulate production instances
 that way?

 Vaguely related to this is leader distribution. We have 9 physical servers
 and 5 JVMs running on each server. By virtue of the deployment procedures
 the first 3 servers to come online are all running 5 leaders each. Is there
 any merit in 'moving' these around (by reboots)?

 Our planning up to launch was based on lots of mailing list response we'd
 seen that indicated leaders had no significant performance difference to
 normal replicas, and all of our testing has agreed with that. The disk size
 'issue' (which we aren't worried about... yet. It hasn't been in prod long
 enough to know for certain) may be the only thing we've seen so far.

 Ta,
 Greg


SolrCloud leaders using more disk space

2014-06-03 Thread Greg Pendlebury
Hi all,

We launched our new production instance of SolrCloud last week and since
then have noticed a trend with regards to disk usage. The non-leader
replicas all seem to be self-optimizing their index segments as expected,
but the leaders have (on average) around 33% more data on disk. My
assumption is that leader's are not self-optimising (or not to the same
extent)... but it is still early days of course.

If it helps, there are 45 JVMs in the cloud, with 15 shards and 3 replicas
per shard. Each non-leader shard is sitting at between 59GB and 87GB on
their SSD, but the leaders are between 84GB and 116GB.

We have pretty much constant read and write traffic 24x7, with just 'slow'
periods overnight when write traffic is  1 document per second and
searches are between 1 and 2 per second. Is this light level of traffic
still too much for the leaders to self-optimise?

I'd also be curious to hear about what others are doing in terms of
operating procedures. We load test before launch what would happen if we
turned off JVMs and forced recovery events. I know that these things all
work, just that customers will experience slower search responses whilst
they occur. For example, a restore from a leader to a replica under load
testing for us takes around 30 minutes and response times drop from around
200-300ms average to 1.5s average.

Bottleneck appears to be network I/O on the servers. We haven't explored
whether this is specific to the servers replicating, or saturation of the
of the infrastructure that all the servers share, because...

This performance is acceptable for us, but I'm not sure if I'd like to
force that event to occur unless required... this is following the line of
reasoning proposed internally that we should periodically rotate leaders by
turning them off briefly. We aren't going to do that unless we have a
strong reason though. Does anyone try to manipulate production instances
that way?

Vaguely related to this is leader distribution. We have 9 physical servers
and 5 JVMs running on each server. By virtue of the deployment procedures
the first 3 servers to come online are all running 5 leaders each. Is there
any merit in 'moving' these around (by reboots)?

Our planning up to launch was based on lots of mailing list response we'd
seen that indicated leaders had no significant performance difference to
normal replicas, and all of our testing has agreed with that. The disk size
'issue' (which we aren't worried about... yet. It hasn't been in prod long
enough to know for certain) may be the only thing we've seen so far.

Ta,
Greg


Re: SolrCloud Leaders

2013-04-22 Thread Furkan KAMACI
Hi Jack;

You said: An hour from now some other replica may be the leader

What is the criteria to change a leader of a shard?

2013/4/15 Jack Krupansky j...@basetechnology.com

 All nodes are replicas in SolrCloud since there are no masters. It's a
 fully distributed model. A leader is also a replica. A leader is simply a
 replica which was elected to be a leader, for now. An hour from now some
 other replica may be the leader.

 It is indeed misleading and inaccurate to suggest that leader and
 replicas are disjoint.

 Once again, I think you are confusing SolrCloud with the older Solr
 master/slave/replication.

 Every node in SolrCloud can do indexing. That's the same as saying that
 every replica in SolrCloud can do indexing.

 Although we do need to be clear that a given replica will only index
 documents for the shard(s) to which it belongs.


 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI
 Sent: Monday, April 15, 2013 9:38 AM
 To: solr-user@lucene.apache.org
 Subject: Re: SolrCloud Leaders

 Here writes something:

 https://support.lucidworks.com/entries/22180608-Solr-HA-DR-overview-3-x-and-4-0-SolrCloud-and

 says:

 Both leaders and replicas index items and perform searches.

 How replicas index items?


 2013/4/15 Furkan KAMACI furkankam...@gmail.com

  Does leaders may response search requests (I mean do they store indexes)
 at when I run SolrCloud at first and after a time later?


 2013/4/15 Jack Krupansky j...@basetechnology.com

  When the cluster is fully operational, yes. But if part of the cluster is
 down or split and unable to communicate, or leader election is in
 progress,
 the actual count of leaders will not be indicative of the number of
 shards.

 Leaders and shards are apples and oranges. If you take down a cluster, by
 definition it would have no leaders (because leaders are running code),
 but
 shards are the files in the index on disk that continue to exist even if
 the code is not running. So, in the extreme, the number of leaders can be
 zero while the number of shards is non-zero on disk.

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI
 Sent: Monday, April 15, 2013 8:21 AM
 To: solr-user@lucene.apache.org
 Subject: SolrCloud Leaders


 Does number of leaders at a SolrCloud is equal to number of shards?







Re: SolrCloud Leaders

2013-04-22 Thread Otis Gospodnetic
If the current leader dies, somebody's got to take over.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Mon, Apr 22, 2013 at 9:41 AM, Furkan KAMACI furkankam...@gmail.com wrote:
 Hi Jack;

 You said: An hour from now some other replica may be the leader

 What is the criteria to change a leader of a shard?

 2013/4/15 Jack Krupansky j...@basetechnology.com

 All nodes are replicas in SolrCloud since there are no masters. It's a
 fully distributed model. A leader is also a replica. A leader is simply a
 replica which was elected to be a leader, for now. An hour from now some
 other replica may be the leader.

 It is indeed misleading and inaccurate to suggest that leader and
 replicas are disjoint.

 Once again, I think you are confusing SolrCloud with the older Solr
 master/slave/replication.

 Every node in SolrCloud can do indexing. That's the same as saying that
 every replica in SolrCloud can do indexing.

 Although we do need to be clear that a given replica will only index
 documents for the shard(s) to which it belongs.


 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI
 Sent: Monday, April 15, 2013 9:38 AM
 To: solr-user@lucene.apache.org
 Subject: Re: SolrCloud Leaders

 Here writes something:

 https://support.lucidworks.com/entries/22180608-Solr-HA-DR-overview-3-x-and-4-0-SolrCloud-and

 says:

 Both leaders and replicas index items and perform searches.

 How replicas index items?


 2013/4/15 Furkan KAMACI furkankam...@gmail.com

  Does leaders may response search requests (I mean do they store indexes)
 at when I run SolrCloud at first and after a time later?


 2013/4/15 Jack Krupansky j...@basetechnology.com

  When the cluster is fully operational, yes. But if part of the cluster is
 down or split and unable to communicate, or leader election is in
 progress,
 the actual count of leaders will not be indicative of the number of
 shards.

 Leaders and shards are apples and oranges. If you take down a cluster, by
 definition it would have no leaders (because leaders are running code),
 but
 shards are the files in the index on disk that continue to exist even if
 the code is not running. So, in the extreme, the number of leaders can be
 zero while the number of shards is non-zero on disk.

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI
 Sent: Monday, April 15, 2013 8:21 AM
 To: solr-user@lucene.apache.org
 Subject: SolrCloud Leaders


 Does number of leaders at a SolrCloud is equal to number of shards?







Re: SolrCloud Leaders

2013-04-22 Thread Jack Krupansky
Leader election will result from nodes coming up and going down as well as 
changes in network connectivity and even simply responsiveness between the 
nodes. A quorum is always needed.


There may be other reasons as well that I don't know about.

The point was simply that it is not a leader vs. replica issue - all of 
the nodes are replicas and one replica just happens to be be playing the 
role of leader at a given moment.


-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI

Sent: Monday, April 22, 2013 9:41 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud Leaders

Hi Jack;

You said: An hour from now some other replica may be the leader

What is the criteria to change a leader of a shard?

2013/4/15 Jack Krupansky j...@basetechnology.com


All nodes are replicas in SolrCloud since there are no masters. It's a
fully distributed model. A leader is also a replica. A leader is simply a
replica which was elected to be a leader, for now. An hour from now some
other replica may be the leader.

It is indeed misleading and inaccurate to suggest that leader and
replicas are disjoint.

Once again, I think you are confusing SolrCloud with the older Solr
master/slave/replication.

Every node in SolrCloud can do indexing. That's the same as saying that
every replica in SolrCloud can do indexing.

Although we do need to be clear that a given replica will only index
documents for the shard(s) to which it belongs.


-- Jack Krupansky

-Original Message- From: Furkan KAMACI
Sent: Monday, April 15, 2013 9:38 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud Leaders

Here writes something:

https://support.lucidworks.com/entries/22180608-Solr-HA-DR-overview-3-x-and-4-0-SolrCloud-and

says:

Both leaders and replicas index items and perform searches.

How replicas index items?


2013/4/15 Furkan KAMACI furkankam...@gmail.com

 Does leaders may response search requests (I mean do they store indexes)

at when I run SolrCloud at first and after a time later?


2013/4/15 Jack Krupansky j...@basetechnology.com

 When the cluster is fully operational, yes. But if part of the cluster 
is

down or split and unable to communicate, or leader election is in
progress,
the actual count of leaders will not be indicative of the number of
shards.

Leaders and shards are apples and oranges. If you take down a cluster, 
by

definition it would have no leaders (because leaders are running code),
but
shards are the files in the index on disk that continue to exist even if
the code is not running. So, in the extreme, the number of leaders can 
be

zero while the number of shards is non-zero on disk.

-- Jack Krupansky

-Original Message- From: Furkan KAMACI
Sent: Monday, April 15, 2013 8:21 AM
To: solr-user@lucene.apache.org
Subject: SolrCloud Leaders


Does number of leaders at a SolrCloud is equal to number of shards?











SolrCloud Leaders

2013-04-15 Thread Furkan KAMACI
Does number of leaders at a SolrCloud is equal to number of shards?


Re: SolrCloud Leaders

2013-04-15 Thread Upayavira
It is supposed to be one leader per shard, yes.

Upayavira

On Mon, Apr 15, 2013, at 01:21 PM, Furkan KAMACI wrote:
 Does number of leaders at a SolrCloud is equal to number of shards?


Re: SolrCloud Leaders

2013-04-15 Thread Jack Krupansky
When the cluster is fully operational, yes. But if part of the cluster is 
down or split and unable to communicate, or leader election is in progress, 
the actual count of leaders will not be indicative of the number of shards.


Leaders and shards are apples and oranges. If you take down a cluster, by 
definition it would have no leaders (because leaders are running code), but 
shards are the files in the index on disk that continue to exist even if the 
code is not running. So, in the extreme, the number of leaders can be zero 
while the number of shards is non-zero on disk.


-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI

Sent: Monday, April 15, 2013 8:21 AM
To: solr-user@lucene.apache.org
Subject: SolrCloud Leaders

Does number of leaders at a SolrCloud is equal to number of shards? 



Re: SolrCloud Leaders

2013-04-15 Thread Furkan KAMACI
Does leaders may response search requests (I mean do they store indexes) at
when I run SolrCloud at first and after a time later?

2013/4/15 Jack Krupansky j...@basetechnology.com

 When the cluster is fully operational, yes. But if part of the cluster is
 down or split and unable to communicate, or leader election is in progress,
 the actual count of leaders will not be indicative of the number of shards.

 Leaders and shards are apples and oranges. If you take down a cluster, by
 definition it would have no leaders (because leaders are running code), but
 shards are the files in the index on disk that continue to exist even if
 the code is not running. So, in the extreme, the number of leaders can be
 zero while the number of shards is non-zero on disk.

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI
 Sent: Monday, April 15, 2013 8:21 AM
 To: solr-user@lucene.apache.org
 Subject: SolrCloud Leaders


 Does number of leaders at a SolrCloud is equal to number of shards?



Re: SolrCloud Leaders

2013-04-15 Thread Furkan KAMACI
Here writes something:
https://support.lucidworks.com/entries/22180608-Solr-HA-DR-overview-3-x-and-4-0-SolrCloud-and
says:

Both leaders and replicas index items and perform searches.

How replicas index items?


2013/4/15 Furkan KAMACI furkankam...@gmail.com

 Does leaders may response search requests (I mean do they store indexes)
 at when I run SolrCloud at first and after a time later?


 2013/4/15 Jack Krupansky j...@basetechnology.com

 When the cluster is fully operational, yes. But if part of the cluster is
 down or split and unable to communicate, or leader election is in progress,
 the actual count of leaders will not be indicative of the number of shards.

 Leaders and shards are apples and oranges. If you take down a cluster, by
 definition it would have no leaders (because leaders are running code), but
 shards are the files in the index on disk that continue to exist even if
 the code is not running. So, in the extreme, the number of leaders can be
 zero while the number of shards is non-zero on disk.

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI
 Sent: Monday, April 15, 2013 8:21 AM
 To: solr-user@lucene.apache.org
 Subject: SolrCloud Leaders


 Does number of leaders at a SolrCloud is equal to number of shards?





Re: SolrCloud Leaders

2013-04-15 Thread Jack Krupansky
All nodes are replicas in SolrCloud since there are no masters. It's a fully 
distributed model. A leader is also a replica. A leader is simply a replica 
which was elected to be a leader, for now. An hour from now some other 
replica may be the leader.


It is indeed misleading and inaccurate to suggest that leader and 
replicas are disjoint.


Once again, I think you are confusing SolrCloud with the older Solr 
master/slave/replication.


Every node in SolrCloud can do indexing. That's the same as saying that 
every replica in SolrCloud can do indexing.


Although we do need to be clear that a given replica will only index 
documents for the shard(s) to which it belongs.


-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI

Sent: Monday, April 15, 2013 9:38 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud Leaders

Here writes something:
https://support.lucidworks.com/entries/22180608-Solr-HA-DR-overview-3-x-and-4-0-SolrCloud-and
says:

Both leaders and replicas index items and perform searches.

How replicas index items?


2013/4/15 Furkan KAMACI furkankam...@gmail.com


Does leaders may response search requests (I mean do they store indexes)
at when I run SolrCloud at first and after a time later?


2013/4/15 Jack Krupansky j...@basetechnology.com


When the cluster is fully operational, yes. But if part of the cluster is
down or split and unable to communicate, or leader election is in 
progress,
the actual count of leaders will not be indicative of the number of 
shards.


Leaders and shards are apples and oranges. If you take down a cluster, by
definition it would have no leaders (because leaders are running code), 
but

shards are the files in the index on disk that continue to exist even if
the code is not running. So, in the extreme, the number of leaders can be
zero while the number of shards is non-zero on disk.

-- Jack Krupansky

-Original Message- From: Furkan KAMACI
Sent: Monday, April 15, 2013 8:21 AM
To: solr-user@lucene.apache.org
Subject: SolrCloud Leaders


Does number of leaders at a SolrCloud is equal to number of shards?