Re: SolrCloud leaders using more disk space
Thanks for the reply Tim. Can you diff the listings of the index data directories on a leader vs. replica? It was a good tip, and mirrors some stuff we have been exploring in house as well. The leaders all have additional 'index.' directories on disk, but we have come to the conclusion that this is a coincidence and not related to the fact that they are leaders. Current theory is that they are the result of an upgrade rehearsal that was performed before launch where the cluster was split into two on different versions of Solr and different ZK paths. I suspect that whilst the ops team where doing the deployment there were a number of server restarts that triggered leader elections and recovery events that weren't allowed to complete gracefully, leaving the old data on disk. The coincidence is simply that the ops team did all their initial practice stuff on the same 3 hosts, which later became our leaders. I've found a few small similar issue on hosts 4-6, and none at all on hosts 7-9. I hoping we get a chance to test all this soon, but we need to re-jig our test systems first, since they don't have any redundancy depth to them right now. Ta, Greg On 28 June 2014 02:59, Timothy Potter thelabd...@gmail.com wrote: Hi Greg, Sorry for the slow response. The general thinking is that you shouldn't worry about which nodes host leaders vs. replicas because A) that can change, and B) as you say, the additional responsibilities for leader nodes is quite minimal (mainly per-doc version management and then distributing updates to replicas). The segment merging all happens at the Lucene level, which has no knowledge of SolrCloud leaders / replicas. Since this is SolrCloud, all nodes pull the config from ZooKeeper so should be running the same settings. Can you diff the listings of the index data directories on a leader vs. replica? Might give us some insights to what files the leader has that the replicas don't have. Cheers, Tim On Tue, Jun 3, 2014 at 8:32 PM, Greg Pendlebury greg.pendleb...@gmail.com wrote: Hi all, We launched our new production instance of SolrCloud last week and since then have noticed a trend with regards to disk usage. The non-leader replicas all seem to be self-optimizing their index segments as expected, but the leaders have (on average) around 33% more data on disk. My assumption is that leader's are not self-optimising (or not to the same extent)... but it is still early days of course. If it helps, there are 45 JVMs in the cloud, with 15 shards and 3 replicas per shard. Each non-leader shard is sitting at between 59GB and 87GB on their SSD, but the leaders are between 84GB and 116GB. We have pretty much constant read and write traffic 24x7, with just 'slow' periods overnight when write traffic is 1 document per second and searches are between 1 and 2 per second. Is this light level of traffic still too much for the leaders to self-optimise? I'd also be curious to hear about what others are doing in terms of operating procedures. We load test before launch what would happen if we turned off JVMs and forced recovery events. I know that these things all work, just that customers will experience slower search responses whilst they occur. For example, a restore from a leader to a replica under load testing for us takes around 30 minutes and response times drop from around 200-300ms average to 1.5s average. Bottleneck appears to be network I/O on the servers. We haven't explored whether this is specific to the servers replicating, or saturation of the of the infrastructure that all the servers share, because... This performance is acceptable for us, but I'm not sure if I'd like to force that event to occur unless required... this is following the line of reasoning proposed internally that we should periodically rotate leaders by turning them off briefly. We aren't going to do that unless we have a strong reason though. Does anyone try to manipulate production instances that way? Vaguely related to this is leader distribution. We have 9 physical servers and 5 JVMs running on each server. By virtue of the deployment procedures the first 3 servers to come online are all running 5 leaders each. Is there any merit in 'moving' these around (by reboots)? Our planning up to launch was based on lots of mailing list response we'd seen that indicated leaders had no significant performance difference to normal replicas, and all of our testing has agreed with that. The disk size 'issue' (which we aren't worried about... yet. It hasn't been in prod long enough to know for certain) may be the only thing we've seen so far. Ta, Greg
Re: SolrCloud leaders using more disk space
Hi Greg, Sorry for the slow response. The general thinking is that you shouldn't worry about which nodes host leaders vs. replicas because A) that can change, and B) as you say, the additional responsibilities for leader nodes is quite minimal (mainly per-doc version management and then distributing updates to replicas). The segment merging all happens at the Lucene level, which has no knowledge of SolrCloud leaders / replicas. Since this is SolrCloud, all nodes pull the config from ZooKeeper so should be running the same settings. Can you diff the listings of the index data directories on a leader vs. replica? Might give us some insights to what files the leader has that the replicas don't have. Cheers, Tim On Tue, Jun 3, 2014 at 8:32 PM, Greg Pendlebury greg.pendleb...@gmail.com wrote: Hi all, We launched our new production instance of SolrCloud last week and since then have noticed a trend with regards to disk usage. The non-leader replicas all seem to be self-optimizing their index segments as expected, but the leaders have (on average) around 33% more data on disk. My assumption is that leader's are not self-optimising (or not to the same extent)... but it is still early days of course. If it helps, there are 45 JVMs in the cloud, with 15 shards and 3 replicas per shard. Each non-leader shard is sitting at between 59GB and 87GB on their SSD, but the leaders are between 84GB and 116GB. We have pretty much constant read and write traffic 24x7, with just 'slow' periods overnight when write traffic is 1 document per second and searches are between 1 and 2 per second. Is this light level of traffic still too much for the leaders to self-optimise? I'd also be curious to hear about what others are doing in terms of operating procedures. We load test before launch what would happen if we turned off JVMs and forced recovery events. I know that these things all work, just that customers will experience slower search responses whilst they occur. For example, a restore from a leader to a replica under load testing for us takes around 30 minutes and response times drop from around 200-300ms average to 1.5s average. Bottleneck appears to be network I/O on the servers. We haven't explored whether this is specific to the servers replicating, or saturation of the of the infrastructure that all the servers share, because... This performance is acceptable for us, but I'm not sure if I'd like to force that event to occur unless required... this is following the line of reasoning proposed internally that we should periodically rotate leaders by turning them off briefly. We aren't going to do that unless we have a strong reason though. Does anyone try to manipulate production instances that way? Vaguely related to this is leader distribution. We have 9 physical servers and 5 JVMs running on each server. By virtue of the deployment procedures the first 3 servers to come online are all running 5 leaders each. Is there any merit in 'moving' these around (by reboots)? Our planning up to launch was based on lots of mailing list response we'd seen that indicated leaders had no significant performance difference to normal replicas, and all of our testing has agreed with that. The disk size 'issue' (which we aren't worried about... yet. It hasn't been in prod long enough to know for certain) may be the only thing we've seen so far. Ta, Greg
SolrCloud leaders using more disk space
Hi all, We launched our new production instance of SolrCloud last week and since then have noticed a trend with regards to disk usage. The non-leader replicas all seem to be self-optimizing their index segments as expected, but the leaders have (on average) around 33% more data on disk. My assumption is that leader's are not self-optimising (or not to the same extent)... but it is still early days of course. If it helps, there are 45 JVMs in the cloud, with 15 shards and 3 replicas per shard. Each non-leader shard is sitting at between 59GB and 87GB on their SSD, but the leaders are between 84GB and 116GB. We have pretty much constant read and write traffic 24x7, with just 'slow' periods overnight when write traffic is 1 document per second and searches are between 1 and 2 per second. Is this light level of traffic still too much for the leaders to self-optimise? I'd also be curious to hear about what others are doing in terms of operating procedures. We load test before launch what would happen if we turned off JVMs and forced recovery events. I know that these things all work, just that customers will experience slower search responses whilst they occur. For example, a restore from a leader to a replica under load testing for us takes around 30 minutes and response times drop from around 200-300ms average to 1.5s average. Bottleneck appears to be network I/O on the servers. We haven't explored whether this is specific to the servers replicating, or saturation of the of the infrastructure that all the servers share, because... This performance is acceptable for us, but I'm not sure if I'd like to force that event to occur unless required... this is following the line of reasoning proposed internally that we should periodically rotate leaders by turning them off briefly. We aren't going to do that unless we have a strong reason though. Does anyone try to manipulate production instances that way? Vaguely related to this is leader distribution. We have 9 physical servers and 5 JVMs running on each server. By virtue of the deployment procedures the first 3 servers to come online are all running 5 leaders each. Is there any merit in 'moving' these around (by reboots)? Our planning up to launch was based on lots of mailing list response we'd seen that indicated leaders had no significant performance difference to normal replicas, and all of our testing has agreed with that. The disk size 'issue' (which we aren't worried about... yet. It hasn't been in prod long enough to know for certain) may be the only thing we've seen so far. Ta, Greg
Re: SolrCloud Leaders
Hi Jack; You said: An hour from now some other replica may be the leader What is the criteria to change a leader of a shard? 2013/4/15 Jack Krupansky j...@basetechnology.com All nodes are replicas in SolrCloud since there are no masters. It's a fully distributed model. A leader is also a replica. A leader is simply a replica which was elected to be a leader, for now. An hour from now some other replica may be the leader. It is indeed misleading and inaccurate to suggest that leader and replicas are disjoint. Once again, I think you are confusing SolrCloud with the older Solr master/slave/replication. Every node in SolrCloud can do indexing. That's the same as saying that every replica in SolrCloud can do indexing. Although we do need to be clear that a given replica will only index documents for the shard(s) to which it belongs. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Monday, April 15, 2013 9:38 AM To: solr-user@lucene.apache.org Subject: Re: SolrCloud Leaders Here writes something: https://support.lucidworks.com/entries/22180608-Solr-HA-DR-overview-3-x-and-4-0-SolrCloud-and says: Both leaders and replicas index items and perform searches. How replicas index items? 2013/4/15 Furkan KAMACI furkankam...@gmail.com Does leaders may response search requests (I mean do they store indexes) at when I run SolrCloud at first and after a time later? 2013/4/15 Jack Krupansky j...@basetechnology.com When the cluster is fully operational, yes. But if part of the cluster is down or split and unable to communicate, or leader election is in progress, the actual count of leaders will not be indicative of the number of shards. Leaders and shards are apples and oranges. If you take down a cluster, by definition it would have no leaders (because leaders are running code), but shards are the files in the index on disk that continue to exist even if the code is not running. So, in the extreme, the number of leaders can be zero while the number of shards is non-zero on disk. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Monday, April 15, 2013 8:21 AM To: solr-user@lucene.apache.org Subject: SolrCloud Leaders Does number of leaders at a SolrCloud is equal to number of shards?
Re: SolrCloud Leaders
If the current leader dies, somebody's got to take over. Otis -- Solr ElasticSearch Support http://sematext.com/ On Mon, Apr 22, 2013 at 9:41 AM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Jack; You said: An hour from now some other replica may be the leader What is the criteria to change a leader of a shard? 2013/4/15 Jack Krupansky j...@basetechnology.com All nodes are replicas in SolrCloud since there are no masters. It's a fully distributed model. A leader is also a replica. A leader is simply a replica which was elected to be a leader, for now. An hour from now some other replica may be the leader. It is indeed misleading and inaccurate to suggest that leader and replicas are disjoint. Once again, I think you are confusing SolrCloud with the older Solr master/slave/replication. Every node in SolrCloud can do indexing. That's the same as saying that every replica in SolrCloud can do indexing. Although we do need to be clear that a given replica will only index documents for the shard(s) to which it belongs. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Monday, April 15, 2013 9:38 AM To: solr-user@lucene.apache.org Subject: Re: SolrCloud Leaders Here writes something: https://support.lucidworks.com/entries/22180608-Solr-HA-DR-overview-3-x-and-4-0-SolrCloud-and says: Both leaders and replicas index items and perform searches. How replicas index items? 2013/4/15 Furkan KAMACI furkankam...@gmail.com Does leaders may response search requests (I mean do they store indexes) at when I run SolrCloud at first and after a time later? 2013/4/15 Jack Krupansky j...@basetechnology.com When the cluster is fully operational, yes. But if part of the cluster is down or split and unable to communicate, or leader election is in progress, the actual count of leaders will not be indicative of the number of shards. Leaders and shards are apples and oranges. If you take down a cluster, by definition it would have no leaders (because leaders are running code), but shards are the files in the index on disk that continue to exist even if the code is not running. So, in the extreme, the number of leaders can be zero while the number of shards is non-zero on disk. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Monday, April 15, 2013 8:21 AM To: solr-user@lucene.apache.org Subject: SolrCloud Leaders Does number of leaders at a SolrCloud is equal to number of shards?
Re: SolrCloud Leaders
Leader election will result from nodes coming up and going down as well as changes in network connectivity and even simply responsiveness between the nodes. A quorum is always needed. There may be other reasons as well that I don't know about. The point was simply that it is not a leader vs. replica issue - all of the nodes are replicas and one replica just happens to be be playing the role of leader at a given moment. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Monday, April 22, 2013 9:41 AM To: solr-user@lucene.apache.org Subject: Re: SolrCloud Leaders Hi Jack; You said: An hour from now some other replica may be the leader What is the criteria to change a leader of a shard? 2013/4/15 Jack Krupansky j...@basetechnology.com All nodes are replicas in SolrCloud since there are no masters. It's a fully distributed model. A leader is also a replica. A leader is simply a replica which was elected to be a leader, for now. An hour from now some other replica may be the leader. It is indeed misleading and inaccurate to suggest that leader and replicas are disjoint. Once again, I think you are confusing SolrCloud with the older Solr master/slave/replication. Every node in SolrCloud can do indexing. That's the same as saying that every replica in SolrCloud can do indexing. Although we do need to be clear that a given replica will only index documents for the shard(s) to which it belongs. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Monday, April 15, 2013 9:38 AM To: solr-user@lucene.apache.org Subject: Re: SolrCloud Leaders Here writes something: https://support.lucidworks.com/entries/22180608-Solr-HA-DR-overview-3-x-and-4-0-SolrCloud-and says: Both leaders and replicas index items and perform searches. How replicas index items? 2013/4/15 Furkan KAMACI furkankam...@gmail.com Does leaders may response search requests (I mean do they store indexes) at when I run SolrCloud at first and after a time later? 2013/4/15 Jack Krupansky j...@basetechnology.com When the cluster is fully operational, yes. But if part of the cluster is down or split and unable to communicate, or leader election is in progress, the actual count of leaders will not be indicative of the number of shards. Leaders and shards are apples and oranges. If you take down a cluster, by definition it would have no leaders (because leaders are running code), but shards are the files in the index on disk that continue to exist even if the code is not running. So, in the extreme, the number of leaders can be zero while the number of shards is non-zero on disk. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Monday, April 15, 2013 8:21 AM To: solr-user@lucene.apache.org Subject: SolrCloud Leaders Does number of leaders at a SolrCloud is equal to number of shards?
SolrCloud Leaders
Does number of leaders at a SolrCloud is equal to number of shards?
Re: SolrCloud Leaders
It is supposed to be one leader per shard, yes. Upayavira On Mon, Apr 15, 2013, at 01:21 PM, Furkan KAMACI wrote: Does number of leaders at a SolrCloud is equal to number of shards?
Re: SolrCloud Leaders
When the cluster is fully operational, yes. But if part of the cluster is down or split and unable to communicate, or leader election is in progress, the actual count of leaders will not be indicative of the number of shards. Leaders and shards are apples and oranges. If you take down a cluster, by definition it would have no leaders (because leaders are running code), but shards are the files in the index on disk that continue to exist even if the code is not running. So, in the extreme, the number of leaders can be zero while the number of shards is non-zero on disk. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Monday, April 15, 2013 8:21 AM To: solr-user@lucene.apache.org Subject: SolrCloud Leaders Does number of leaders at a SolrCloud is equal to number of shards?
Re: SolrCloud Leaders
Does leaders may response search requests (I mean do they store indexes) at when I run SolrCloud at first and after a time later? 2013/4/15 Jack Krupansky j...@basetechnology.com When the cluster is fully operational, yes. But if part of the cluster is down or split and unable to communicate, or leader election is in progress, the actual count of leaders will not be indicative of the number of shards. Leaders and shards are apples and oranges. If you take down a cluster, by definition it would have no leaders (because leaders are running code), but shards are the files in the index on disk that continue to exist even if the code is not running. So, in the extreme, the number of leaders can be zero while the number of shards is non-zero on disk. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Monday, April 15, 2013 8:21 AM To: solr-user@lucene.apache.org Subject: SolrCloud Leaders Does number of leaders at a SolrCloud is equal to number of shards?
Re: SolrCloud Leaders
Here writes something: https://support.lucidworks.com/entries/22180608-Solr-HA-DR-overview-3-x-and-4-0-SolrCloud-and says: Both leaders and replicas index items and perform searches. How replicas index items? 2013/4/15 Furkan KAMACI furkankam...@gmail.com Does leaders may response search requests (I mean do they store indexes) at when I run SolrCloud at first and after a time later? 2013/4/15 Jack Krupansky j...@basetechnology.com When the cluster is fully operational, yes. But if part of the cluster is down or split and unable to communicate, or leader election is in progress, the actual count of leaders will not be indicative of the number of shards. Leaders and shards are apples and oranges. If you take down a cluster, by definition it would have no leaders (because leaders are running code), but shards are the files in the index on disk that continue to exist even if the code is not running. So, in the extreme, the number of leaders can be zero while the number of shards is non-zero on disk. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Monday, April 15, 2013 8:21 AM To: solr-user@lucene.apache.org Subject: SolrCloud Leaders Does number of leaders at a SolrCloud is equal to number of shards?
Re: SolrCloud Leaders
All nodes are replicas in SolrCloud since there are no masters. It's a fully distributed model. A leader is also a replica. A leader is simply a replica which was elected to be a leader, for now. An hour from now some other replica may be the leader. It is indeed misleading and inaccurate to suggest that leader and replicas are disjoint. Once again, I think you are confusing SolrCloud with the older Solr master/slave/replication. Every node in SolrCloud can do indexing. That's the same as saying that every replica in SolrCloud can do indexing. Although we do need to be clear that a given replica will only index documents for the shard(s) to which it belongs. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Monday, April 15, 2013 9:38 AM To: solr-user@lucene.apache.org Subject: Re: SolrCloud Leaders Here writes something: https://support.lucidworks.com/entries/22180608-Solr-HA-DR-overview-3-x-and-4-0-SolrCloud-and says: Both leaders and replicas index items and perform searches. How replicas index items? 2013/4/15 Furkan KAMACI furkankam...@gmail.com Does leaders may response search requests (I mean do they store indexes) at when I run SolrCloud at first and after a time later? 2013/4/15 Jack Krupansky j...@basetechnology.com When the cluster is fully operational, yes. But if part of the cluster is down or split and unable to communicate, or leader election is in progress, the actual count of leaders will not be indicative of the number of shards. Leaders and shards are apples and oranges. If you take down a cluster, by definition it would have no leaders (because leaders are running code), but shards are the files in the index on disk that continue to exist even if the code is not running. So, in the extreme, the number of leaders can be zero while the number of shards is non-zero on disk. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Monday, April 15, 2013 8:21 AM To: solr-user@lucene.apache.org Subject: SolrCloud Leaders Does number of leaders at a SolrCloud is equal to number of shards?