Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
Great. Thanks for your suggestions. I'll go through them and see what I can come up with to try and tame my GC pauses. I'll also make sure I upgrade to 4.4 before I start. Then at least I know I've got all the latest changes. In the meantime, does anyone have any idea why I am able to get leaders who are marked as down? I've just had the situation where of two nodes hosting replicas of the same shard the leader was alive and marked as down and the other replica was gone. I could perform searches directly on the two nodes (with distrib=false) and once I'd restarted the node which was down the leader sprung into live. I assume that since there was a change in clusterstate.json it forced the leader to reconsider what it was up to. Does anyone know the hole my nodes are falling into? Is it likely to be tied up in my GC woes? On 23 July 2013 13:06, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, On Tue, Jul 23, 2013 at 8:02 AM, Erick Erickson erickerick...@gmail.com wrote: Neil: Here's a must-read blog about why allocating more memory to the JVM than Solr requires is a Bad Thing: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html It turns out that you actually do yourself harm by allocating more memory to the JVM than it really needs. Of course the problem is figuring out how much it really needs, which if pretty tricky. Your long GC pauses _might_ be ameliorated by allocating _less_ memory to the JVM, counterintuitive as that seems. or by using G1 :) See http://blog.sematext.com/2013/06/24/g1-cms-java-garbage-collector/ Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Mon, Jul 22, 2013 at 5:05 PM, Neil Prosser neil.pros...@gmail.com wrote: I just have a little python script which I run with cron (luckily that's the granularity we have in Graphite). It reads the same JSON the admin UI displays and dumps numeric values into Graphite. I can open source it if you like. I just need to make sure I remove any hacks/shortcuts that I've taken because I'm working with our cluster! On 22 July 2013 19:26, Lance Norskog goks...@gmail.com wrote: Are you feeding Graphite from Solr? If so, how? On 07/19/2013 01:02 AM, Neil Prosser wrote: That was overnight so I was unable to track exactly what happened (I'm going off our Graphite graphs here).
Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
Log messages? On Wed, Jul 24, 2013 at 1:37 AM, Neil Prosser neil.pros...@gmail.com wrote: Great. Thanks for your suggestions. I'll go through them and see what I can come up with to try and tame my GC pauses. I'll also make sure I upgrade to 4.4 before I start. Then at least I know I've got all the latest changes. In the meantime, does anyone have any idea why I am able to get leaders who are marked as down? I've just had the situation where of two nodes hosting replicas of the same shard the leader was alive and marked as down and the other replica was gone. I could perform searches directly on the two nodes (with distrib=false) and once I'd restarted the node which was down the leader sprung into live. I assume that since there was a change in clusterstate.json it forced the leader to reconsider what it was up to. Does anyone know the hole my nodes are falling into? Is it likely to be tied up in my GC woes? On 23 July 2013 13:06, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, On Tue, Jul 23, 2013 at 8:02 AM, Erick Erickson erickerick...@gmail.com wrote: Neil: Here's a must-read blog about why allocating more memory to the JVM than Solr requires is a Bad Thing: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html It turns out that you actually do yourself harm by allocating more memory to the JVM than it really needs. Of course the problem is figuring out how much it really needs, which if pretty tricky. Your long GC pauses _might_ be ameliorated by allocating _less_ memory to the JVM, counterintuitive as that seems. or by using G1 :) See http://blog.sematext.com/2013/06/24/g1-cms-java-garbage-collector/ Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Mon, Jul 22, 2013 at 5:05 PM, Neil Prosser neil.pros...@gmail.com wrote: I just have a little python script which I run with cron (luckily that's the granularity we have in Graphite). It reads the same JSON the admin UI displays and dumps numeric values into Graphite. I can open source it if you like. I just need to make sure I remove any hacks/shortcuts that I've taken because I'm working with our cluster! On 22 July 2013 19:26, Lance Norskog goks...@gmail.com wrote: Are you feeding Graphite from Solr? If so, how? On 07/19/2013 01:02 AM, Neil Prosser wrote: That was overnight so I was unable to track exactly what happened (I'm going off our Graphite graphs here).
Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
Sorry, good point... https://gist.github.com/neilprosser/d75a13d9e4b7caba51ab I've included the log files for two servers hosting the same shard for the same time period. The logging settings exclude anything below WARN for org.apache.zookeeper, org.apache.solr.core.SolrCore and org.apache.solr.update.processor.LogUpdateProcessor. That said, there's still a lot of spam there. The log for server09 starts with it throwing OutOfMemoryErrors. At this point I externally have it listed as recovering. Unfortunately I haven't got the GC logs for either box in that time period. The key times that I know of are: 2013-07-24 07:14:08,560 - server04 registers its state as down. 2013-07-24 07:17:38,462 - server04 says it's the new leader (this ties in with my external Graphite script observing that at 07:17 server04 was both leader and down). 2013-07-24 07:31:21,667 - I get involved and server09 is restarted. 2013-07-24 07:31:42,408 - server04 updates its cloud state from ZooKeeper and realises that it's the leader. 2013-07-24 07:31:42,449 - server04 registers its state as active. I'm sorry there's so much there. I'm still getting used to what's important for people. Both servers were running 4.3.1. I've since upgraded to 4.4.0. If you need any more information or want me to do any filtering let me know. On 24 July 2013 15:50, Timothy Potter thelabd...@gmail.com wrote: Log messages? On Wed, Jul 24, 2013 at 1:37 AM, Neil Prosser neil.pros...@gmail.com wrote: Great. Thanks for your suggestions. I'll go through them and see what I can come up with to try and tame my GC pauses. I'll also make sure I upgrade to 4.4 before I start. Then at least I know I've got all the latest changes. In the meantime, does anyone have any idea why I am able to get leaders who are marked as down? I've just had the situation where of two nodes hosting replicas of the same shard the leader was alive and marked as down and the other replica was gone. I could perform searches directly on the two nodes (with distrib=false) and once I'd restarted the node which was down the leader sprung into live. I assume that since there was a change in clusterstate.json it forced the leader to reconsider what it was up to. Does anyone know the hole my nodes are falling into? Is it likely to be tied up in my GC woes? On 23 July 2013 13:06, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, On Tue, Jul 23, 2013 at 8:02 AM, Erick Erickson erickerick...@gmail.com wrote: Neil: Here's a must-read blog about why allocating more memory to the JVM than Solr requires is a Bad Thing: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html It turns out that you actually do yourself harm by allocating more memory to the JVM than it really needs. Of course the problem is figuring out how much it really needs, which if pretty tricky. Your long GC pauses _might_ be ameliorated by allocating _less_ memory to the JVM, counterintuitive as that seems. or by using G1 :) See http://blog.sematext.com/2013/06/24/g1-cms-java-garbage-collector/ Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Mon, Jul 22, 2013 at 5:05 PM, Neil Prosser neil.pros...@gmail.com wrote: I just have a little python script which I run with cron (luckily that's the granularity we have in Graphite). It reads the same JSON the admin UI displays and dumps numeric values into Graphite. I can open source it if you like. I just need to make sure I remove any hacks/shortcuts that I've taken because I'm working with our cluster! On 22 July 2013 19:26, Lance Norskog goks...@gmail.com wrote: Are you feeding Graphite from Solr? If so, how? On 07/19/2013 01:02 AM, Neil Prosser wrote: That was overnight so I was unable to track exactly what happened (I'm going off our Graphite graphs here).
Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
One thing I'm seeing in your logs is the leaderVoteWait safety mechanism that I mentioned previously: 2013-07-24 07:06:19,856 INFO o.a.s.c.ShardLeaderElectionContext - Waiting until we see more replicas up: total=2 found=1 timeoutin=45792 From Mark M: This is a safety mechanism - you can turn it off by configuring leaderVoteWait to 0 in solr.xml. This is meant to protect the case where you stop a shard or it fails and then the first node to get started back up has stale data - you don't want it to just become the leader. So we wait to see everyone we know about in the shard up to 3 or 5 min by default. Then we know all the shards participate in the leader election and the leader will end up with all updates it should have. You can lower that wait or turn it off with 0. NOTE: I tried setting it to 0 and my cluster went haywire, so consider just lowering it but not making it zero ;-) From what I can tell, server04 heads into this leaderVoteWait loop before it declares success as the leader which is why it is registered as down. Down is different than gone, so it's likely it can respond to queries. On Wed, Jul 24, 2013 at 10:33 AM, Neil Prosser neil.pros...@gmail.com wrote: Sorry, good point... https://gist.github.com/neilprosser/d75a13d9e4b7caba51ab I've included the log files for two servers hosting the same shard for the same time period. The logging settings exclude anything below WARN for org.apache.zookeeper, org.apache.solr.core.SolrCore and org.apache.solr.update.processor.LogUpdateProcessor. That said, there's still a lot of spam there. The log for server09 starts with it throwing OutOfMemoryErrors. At this point I externally have it listed as recovering. Unfortunately I haven't got the GC logs for either box in that time period. The key times that I know of are: 2013-07-24 07:14:08,560 - server04 registers its state as down. 2013-07-24 07:17:38,462 - server04 says it's the new leader (this ties in with my external Graphite script observing that at 07:17 server04 was both leader and down). 2013-07-24 07:31:21,667 - I get involved and server09 is restarted. 2013-07-24 07:31:42,408 - server04 updates its cloud state from ZooKeeper and realises that it's the leader. 2013-07-24 07:31:42,449 - server04 registers its state as active. I'm sorry there's so much there. I'm still getting used to what's important for people. Both servers were running 4.3.1. I've since upgraded to 4.4.0. If you need any more information or want me to do any filtering let me know. On 24 July 2013 15:50, Timothy Potter thelabd...@gmail.com wrote: Log messages? On Wed, Jul 24, 2013 at 1:37 AM, Neil Prosser neil.pros...@gmail.com wrote: Great. Thanks for your suggestions. I'll go through them and see what I can come up with to try and tame my GC pauses. I'll also make sure I upgrade to 4.4 before I start. Then at least I know I've got all the latest changes. In the meantime, does anyone have any idea why I am able to get leaders who are marked as down? I've just had the situation where of two nodes hosting replicas of the same shard the leader was alive and marked as down and the other replica was gone. I could perform searches directly on the two nodes (with distrib=false) and once I'd restarted the node which was down the leader sprung into live. I assume that since there was a change in clusterstate.json it forced the leader to reconsider what it was up to. Does anyone know the hole my nodes are falling into? Is it likely to be tied up in my GC woes? On 23 July 2013 13:06, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, On Tue, Jul 23, 2013 at 8:02 AM, Erick Erickson erickerick...@gmail.com wrote: Neil: Here's a must-read blog about why allocating more memory to the JVM than Solr requires is a Bad Thing: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html It turns out that you actually do yourself harm by allocating more memory to the JVM than it really needs. Of course the problem is figuring out how much it really needs, which if pretty tricky. Your long GC pauses _might_ be ameliorated by allocating _less_ memory to the JVM, counterintuitive as that seems. or by using G1 :) See http://blog.sematext.com/2013/06/24/g1-cms-java-garbage-collector/ Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Mon, Jul 22, 2013 at 5:05 PM, Neil Prosser neil.pros...@gmail.com wrote: I just have a little python script which I run with cron (luckily that's the granularity we have in Graphite). It reads the same JSON the admin UI displays and dumps numeric values into Graphite. I can open source it if you like. I just need to make sure I remove any hacks/shortcuts that I've taken because I'm working with our cluster! On 22 July 2013 19:26, Lance Norskog
Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
On 7/24/2013 10:33 AM, Neil Prosser wrote: The log for server09 starts with it throwing OutOfMemoryErrors. At this point I externally have it listed as recovering. Unfortunately I haven't got the GC logs for either box in that time period. There's a lot of messages in this thread, so I apologize if this has been dealt with already by previous email messages. All bets are off when you start throwing OOM errors. The state of any java program pretty much becomes completely undefined when you run out of heap memory. It just so happens that I just finished updating a wiki page about reducing heap requirements for another message on the list. GC pause problems have already been mentioned, so increasing the heap may not be a real option here. Take a look at the following for ways to reduce your heap requirements: https://wiki.apache.org/solr/SolrPerformanceProblems#Reducing_heap_requirements Thanks, Shawn
Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
That makes sense about all bets being off. I wanted to make sure that people whose systems are behaving sensibly weren't going to have problems. I think I need to tame the base amount of memory the field cache takes. We currently do boosting on several fields during most queries. We boost by at least 64 long fields (only one or two per query, but that many fields in total) so with a maxDocs of around 10m on each shard we're talking about nearly 5GB of heap just for the field cache. Then I still want to have a filter cache which protects me from chewing too much CPU during requests. I've seen a lot of concurrent mode failures flying by while staring at the GC log so I think I need to get GC to kick in a bit sooner and try to limit those (is it possible to eliminate them completely?). Performance is definitely suffering while that's happening. I had a go with G1 and no special options and things slowed down a bit. I'm beginning to understand what to look for in the CMS GC logs so I'll stick with that and see where I get to. Thanks for the link. I'll give it a proper read when I get into work tomorrow. I have a feeling that five shards might not be the sweet spot for the spec of the machines I'm running on. Our goal was to replace five 96GB physicals with 48GB heaps doing master/slave replication for an index which is at least 120GB in size. At the moment we're using ten VMs with 24GB of RAM, 8GB heaps and around 10GB of index. These machines are managing to get the whole index into the Linux OS cache. Hopefully the 5GB minimum for field cache and 8GB heap is what's causing this trouble right now. On 24 July 2013 19:06, Shawn Heisey s...@elyograg.org wrote: On 7/24/2013 10:33 AM, Neil Prosser wrote: The log for server09 starts with it throwing OutOfMemoryErrors. At this point I externally have it listed as recovering. Unfortunately I haven't got the GC logs for either box in that time period. There's a lot of messages in this thread, so I apologize if this has been dealt with already by previous email messages. All bets are off when you start throwing OOM errors. The state of any java program pretty much becomes completely undefined when you run out of heap memory. It just so happens that I just finished updating a wiki page about reducing heap requirements for another message on the list. GC pause problems have already been mentioned, so increasing the heap may not be a real option here. Take a look at the following for ways to reduce your heap requirements: https://wiki.apache.org/solr/SolrPerformanceProblems#Reducing_heap_requirements Thanks, Shawn
Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
Neil: Here's a must-read blog about why allocating more memory to the JVM than Solr requires is a Bad Thing: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html It turns out that you actually do yourself harm by allocating more memory to the JVM than it really needs. Of course the problem is figuring out how much it really needs, which if pretty tricky. Your long GC pauses _might_ be ameliorated by allocating _less_ memory to the JVM, counterintuitive as that seems. Best Erick On Mon, Jul 22, 2013 at 5:05 PM, Neil Prosser neil.pros...@gmail.com wrote: I just have a little python script which I run with cron (luckily that's the granularity we have in Graphite). It reads the same JSON the admin UI displays and dumps numeric values into Graphite. I can open source it if you like. I just need to make sure I remove any hacks/shortcuts that I've taken because I'm working with our cluster! On 22 July 2013 19:26, Lance Norskog goks...@gmail.com wrote: Are you feeding Graphite from Solr? If so, how? On 07/19/2013 01:02 AM, Neil Prosser wrote: That was overnight so I was unable to track exactly what happened (I'm going off our Graphite graphs here).
Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
Hi, On Tue, Jul 23, 2013 at 8:02 AM, Erick Erickson erickerick...@gmail.com wrote: Neil: Here's a must-read blog about why allocating more memory to the JVM than Solr requires is a Bad Thing: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html It turns out that you actually do yourself harm by allocating more memory to the JVM than it really needs. Of course the problem is figuring out how much it really needs, which if pretty tricky. Your long GC pauses _might_ be ameliorated by allocating _less_ memory to the JVM, counterintuitive as that seems. or by using G1 :) See http://blog.sematext.com/2013/06/24/g1-cms-java-garbage-collector/ Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Mon, Jul 22, 2013 at 5:05 PM, Neil Prosser neil.pros...@gmail.com wrote: I just have a little python script which I run with cron (luckily that's the granularity we have in Graphite). It reads the same JSON the admin UI displays and dumps numeric values into Graphite. I can open source it if you like. I just need to make sure I remove any hacks/shortcuts that I've taken because I'm working with our cluster! On 22 July 2013 19:26, Lance Norskog goks...@gmail.com wrote: Are you feeding Graphite from Solr? If so, how? On 07/19/2013 01:02 AM, Neil Prosser wrote: That was overnight so I was unable to track exactly what happened (I'm going off our Graphite graphs here).
Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
Very true. I was impatient (I think less than three minutes impatient so hopefully 4.4 will save me from myself) but I didn't realise it was doing something rather than just hanging. Next time I have to restart a node I'll just leave and go get a cup of coffee or something. My configuration is set to auto hard-commit every 5 minutes. No auto soft-commit time is set. Over the course of the weekend, while left unattended the nodes have been going up and down (I've got to solve the issue that is causing them to come and go, but any suggestions on what is likely to be causing something like that are welcome), at one point one of the nodes stopped taking updates. After indexing properly for a few hours with that one shard not accepting updates, the replica of that shard which contains all the correct documents must have replicated from the broken node and dropped documents. Is there any protection against this in Solr or should I be focusing on getting my nodes to be more reliable? I've now got a situation where four of my five shards have leaders who are marked as down and followers who are up. I'm going to start grabbing information about the cluster state so I can track which changes are happening and in what order. I can get hold of Solr logs and garbage collection logs while these things are happening. Is this all just down to my nodes being unreliable? On 21 July 2013 13:52, Erick Erickson erickerick...@gmail.com wrote: Well, if I'm reading this right you had a node go out of circulation and then bounced nodes until that node became the leader. So of course it wouldn't have the documents (how could it?). Basically you shot yourself in the foot. Underlying here is why it took the machine you were re-starting so long to come up that you got impatient and started killing nodes. There has been quite a bit done to make that process better, so what version of Solr are you using? 4.4 is being voted on right now, so if you might want to consider upgrading. There was, for instance, a situation where it would take 3 minutes for machines to start up. How impatient were you? Also, what are your hard commit parameters? All of the documents you're indexing will be in the transaction log between hard commits, and when a node comes up the leader will replay everything in the tlog to the new node, which might be a source of why it took so long for the new node to come back up. At the very least the new node you were bringing back online will need to do a full index replication (old style) to get caught up. Best Erick On Fri, Jul 19, 2013 at 4:02 AM, Neil Prosser neil.pros...@gmail.com wrote: While indexing some documents to a SolrCloud cluster (10 machines, 5 shards and 2 replicas, so one replica on each machine) one of the replicas stopped receiving documents, while the other replica of the shard continued to grow. That was overnight so I was unable to track exactly what happened (I'm going off our Graphite graphs here). This morning when I was able to look at the cluster both replicas of that shard were marked as down (with one marked as leader). I attempted to restart the non-leader node but it took a long time to restart so I killed it and restarted the old leader, which also took a long time. I killed that one (I'm impatient) and left the non-leader node to restart, not realising it was missing approximately 700k documents that the old leader had. Eventually it restarted and became leader. I restarted the old leader and it dropped the number of documents it had to match the previous non-leader. Is this expected behaviour when a replica with fewer documents is started before the other and elected leader? Should I have been paying more attention to the number of documents on the server before restarting nodes? I am still in the process of tuning the caches and warming for these servers but we are putting some load through the cluster so it is possible that the nodes are having to work quite hard when a new version of the core comes is made available. Is this likely to explain why I occasionally see nodes dropping out? Unfortunately in restarting the nodes I lost the GC logs to see whether that was likely to be the culprit. Is this the sort of situation where you raise the ZooKeeper timeout a bit? Currently the timeout for all nodes is 15 seconds. Are there any known issues which might explain what's happening? I'm just getting started with SolrCloud after using standard master/slave replication for an index which has got too big for one machine over the last few months. Also, is there any particular information that would be helpful to help with these issues if it should happen again?
Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
Wow, you really shouldn't be having nodes go up and down so frequently, that's a big red flag. That said, SolrCloud should be pretty robust so this is something to pursue... But even a 5 minute hard commit can lead to a hefty transaction log under load, you may want to reduce it substantially depending on how fast you are sending docs to the index. I'm talking 15-30 seconds here. It's critical that openSearcher be set to false or you'll invalidate your caches that often. All a hard commit with openSearcher set to false does is close off the current segment and open a new one. It does NOT open/warm new searchers etc. The soft commits control visibility, so that's how you control whether you can search the docs or not. Pardon me if I'm repeating stuff you already know! As far as your nodes coming and going, I've seen some people have good results by upping the ZooKeeper timeout limit. So I guess my first question is whether the nodes are actually going out of service or whether it's just a timeout issue Good luck! Erick On Mon, Jul 22, 2013 at 3:29 AM, Neil Prosser neil.pros...@gmail.com wrote: Very true. I was impatient (I think less than three minutes impatient so hopefully 4.4 will save me from myself) but I didn't realise it was doing something rather than just hanging. Next time I have to restart a node I'll just leave and go get a cup of coffee or something. My configuration is set to auto hard-commit every 5 minutes. No auto soft-commit time is set. Over the course of the weekend, while left unattended the nodes have been going up and down (I've got to solve the issue that is causing them to come and go, but any suggestions on what is likely to be causing something like that are welcome), at one point one of the nodes stopped taking updates. After indexing properly for a few hours with that one shard not accepting updates, the replica of that shard which contains all the correct documents must have replicated from the broken node and dropped documents. Is there any protection against this in Solr or should I be focusing on getting my nodes to be more reliable? I've now got a situation where four of my five shards have leaders who are marked as down and followers who are up. I'm going to start grabbing information about the cluster state so I can track which changes are happening and in what order. I can get hold of Solr logs and garbage collection logs while these things are happening. Is this all just down to my nodes being unreliable? On 21 July 2013 13:52, Erick Erickson erickerick...@gmail.com wrote: Well, if I'm reading this right you had a node go out of circulation and then bounced nodes until that node became the leader. So of course it wouldn't have the documents (how could it?). Basically you shot yourself in the foot. Underlying here is why it took the machine you were re-starting so long to come up that you got impatient and started killing nodes. There has been quite a bit done to make that process better, so what version of Solr are you using? 4.4 is being voted on right now, so if you might want to consider upgrading. There was, for instance, a situation where it would take 3 minutes for machines to start up. How impatient were you? Also, what are your hard commit parameters? All of the documents you're indexing will be in the transaction log between hard commits, and when a node comes up the leader will replay everything in the tlog to the new node, which might be a source of why it took so long for the new node to come back up. At the very least the new node you were bringing back online will need to do a full index replication (old style) to get caught up. Best Erick On Fri, Jul 19, 2013 at 4:02 AM, Neil Prosser neil.pros...@gmail.com wrote: While indexing some documents to a SolrCloud cluster (10 machines, 5 shards and 2 replicas, so one replica on each machine) one of the replicas stopped receiving documents, while the other replica of the shard continued to grow. That was overnight so I was unable to track exactly what happened (I'm going off our Graphite graphs here). This morning when I was able to look at the cluster both replicas of that shard were marked as down (with one marked as leader). I attempted to restart the non-leader node but it took a long time to restart so I killed it and restarted the old leader, which also took a long time. I killed that one (I'm impatient) and left the non-leader node to restart, not realising it was missing approximately 700k documents that the old leader had. Eventually it restarted and became leader. I restarted the old leader and it dropped the number of documents it had to match the previous non-leader. Is this expected behaviour when a replica with fewer documents is started before the other and elected leader? Should I have been paying more attention to the number of documents on the server before restarting nodes? I am still
Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
No need to apologise. It's always good to have things like that reiterated in case I've misunderstood along the way. I have a feeling that it's related to garbage collection. I assume that if the JVM heads into a stop-the-world GC Solr can't let ZooKeeper know it's still alive and so gets marked as down. I've just taken a look at the GC logs and can see a couple of full collections which took longer than my ZK timeout of 15s). I'm still in the process of tuning the cache sizes and have probably got it wrong (I'm coming from a Solr instance which runs on a 48G heap with ~40m documents and bringing it into five shards with 8G heap). I thought I was being conservative with the cache sizes but I should probably drop them right down and start again. The entire index is cached by Linux so I should just need caches to help with things which eat CPU at request time. The indexing level is unusual because normally we wouldn't be indexing everything sequentially, just making delta updates to the index as things are changed in our MoR. However, it's handy to know how it reacts under the most extreme load we could give it. In the case that I set my hard commit time to 15-30 seconds with openSearcher set to false, how do I control when I actually do invalidate the caches and open a new searcher? Is this something that Solr can do automatically, or will I need some sort of coordinator process to perform a 'proper' commit from outside Solr? In our case the process of opening a new searcher is definitely a hefty operation. We have a large number of boosts and filters which are used for just about every query that is made against the index so we currently have them warmed which can take upwards of a minute on our giant core. Thanks for your help. On 22 July 2013 13:00, Erick Erickson erickerick...@gmail.com wrote: Wow, you really shouldn't be having nodes go up and down so frequently, that's a big red flag. That said, SolrCloud should be pretty robust so this is something to pursue... But even a 5 minute hard commit can lead to a hefty transaction log under load, you may want to reduce it substantially depending on how fast you are sending docs to the index. I'm talking 15-30 seconds here. It's critical that openSearcher be set to false or you'll invalidate your caches that often. All a hard commit with openSearcher set to false does is close off the current segment and open a new one. It does NOT open/warm new searchers etc. The soft commits control visibility, so that's how you control whether you can search the docs or not. Pardon me if I'm repeating stuff you already know! As far as your nodes coming and going, I've seen some people have good results by upping the ZooKeeper timeout limit. So I guess my first question is whether the nodes are actually going out of service or whether it's just a timeout issue Good luck! Erick On Mon, Jul 22, 2013 at 3:29 AM, Neil Prosser neil.pros...@gmail.com wrote: Very true. I was impatient (I think less than three minutes impatient so hopefully 4.4 will save me from myself) but I didn't realise it was doing something rather than just hanging. Next time I have to restart a node I'll just leave and go get a cup of coffee or something. My configuration is set to auto hard-commit every 5 minutes. No auto soft-commit time is set. Over the course of the weekend, while left unattended the nodes have been going up and down (I've got to solve the issue that is causing them to come and go, but any suggestions on what is likely to be causing something like that are welcome), at one point one of the nodes stopped taking updates. After indexing properly for a few hours with that one shard not accepting updates, the replica of that shard which contains all the correct documents must have replicated from the broken node and dropped documents. Is there any protection against this in Solr or should I be focusing on getting my nodes to be more reliable? I've now got a situation where four of my five shards have leaders who are marked as down and followers who are up. I'm going to start grabbing information about the cluster state so I can track which changes are happening and in what order. I can get hold of Solr logs and garbage collection logs while these things are happening. Is this all just down to my nodes being unreliable? On 21 July 2013 13:52, Erick Erickson erickerick...@gmail.com wrote: Well, if I'm reading this right you had a node go out of circulation and then bounced nodes until that node became the leader. So of course it wouldn't have the documents (how could it?). Basically you shot yourself in the foot. Underlying here is why it took the machine you were re-starting so long to come up that you got impatient and started killing nodes. There has been quite a bit done to make that process better, so what version of Solr are you using? 4.4 is being voted on right now, so
Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
Sorry, I should also mention that these leader nodes which are marked as down can actually still be queried locally with distrib=false with no problems. Is it possible that they've somehow got themselves out-of-sync? On 22 July 2013 13:37, Neil Prosser neil.pros...@gmail.com wrote: No need to apologise. It's always good to have things like that reiterated in case I've misunderstood along the way. I have a feeling that it's related to garbage collection. I assume that if the JVM heads into a stop-the-world GC Solr can't let ZooKeeper know it's still alive and so gets marked as down. I've just taken a look at the GC logs and can see a couple of full collections which took longer than my ZK timeout of 15s). I'm still in the process of tuning the cache sizes and have probably got it wrong (I'm coming from a Solr instance which runs on a 48G heap with ~40m documents and bringing it into five shards with 8G heap). I thought I was being conservative with the cache sizes but I should probably drop them right down and start again. The entire index is cached by Linux so I should just need caches to help with things which eat CPU at request time. The indexing level is unusual because normally we wouldn't be indexing everything sequentially, just making delta updates to the index as things are changed in our MoR. However, it's handy to know how it reacts under the most extreme load we could give it. In the case that I set my hard commit time to 15-30 seconds with openSearcher set to false, how do I control when I actually do invalidate the caches and open a new searcher? Is this something that Solr can do automatically, or will I need some sort of coordinator process to perform a 'proper' commit from outside Solr? In our case the process of opening a new searcher is definitely a hefty operation. We have a large number of boosts and filters which are used for just about every query that is made against the index so we currently have them warmed which can take upwards of a minute on our giant core. Thanks for your help. On 22 July 2013 13:00, Erick Erickson erickerick...@gmail.com wrote: Wow, you really shouldn't be having nodes go up and down so frequently, that's a big red flag. That said, SolrCloud should be pretty robust so this is something to pursue... But even a 5 minute hard commit can lead to a hefty transaction log under load, you may want to reduce it substantially depending on how fast you are sending docs to the index. I'm talking 15-30 seconds here. It's critical that openSearcher be set to false or you'll invalidate your caches that often. All a hard commit with openSearcher set to false does is close off the current segment and open a new one. It does NOT open/warm new searchers etc. The soft commits control visibility, so that's how you control whether you can search the docs or not. Pardon me if I'm repeating stuff you already know! As far as your nodes coming and going, I've seen some people have good results by upping the ZooKeeper timeout limit. So I guess my first question is whether the nodes are actually going out of service or whether it's just a timeout issue Good luck! Erick On Mon, Jul 22, 2013 at 3:29 AM, Neil Prosser neil.pros...@gmail.com wrote: Very true. I was impatient (I think less than three minutes impatient so hopefully 4.4 will save me from myself) but I didn't realise it was doing something rather than just hanging. Next time I have to restart a node I'll just leave and go get a cup of coffee or something. My configuration is set to auto hard-commit every 5 minutes. No auto soft-commit time is set. Over the course of the weekend, while left unattended the nodes have been going up and down (I've got to solve the issue that is causing them to come and go, but any suggestions on what is likely to be causing something like that are welcome), at one point one of the nodes stopped taking updates. After indexing properly for a few hours with that one shard not accepting updates, the replica of that shard which contains all the correct documents must have replicated from the broken node and dropped documents. Is there any protection against this in Solr or should I be focusing on getting my nodes to be more reliable? I've now got a situation where four of my five shards have leaders who are marked as down and followers who are up. I'm going to start grabbing information about the cluster state so I can track which changes are happening and in what order. I can get hold of Solr logs and garbage collection logs while these things are happening. Is this all just down to my nodes being unreliable? On 21 July 2013 13:52, Erick Erickson erickerick...@gmail.com wrote: Well, if I'm reading this right you had a node go out of circulation and then bounced nodes until that node became the leader. So of course it wouldn't have the documents (how could it?).
RE: Solr 4.3.1 - SolrCloud nodes down and lost documents
It is possible: https://issues.apache.org/jira/browse/SOLR-4260 I rarely see it and i cannot reliably reproduce it but it just sometimes happens. Nodes will not bring each other back in sync. -Original message- From:Neil Prosser neil.pros...@gmail.com Sent: Monday 22nd July 2013 14:41 To: solr-user@lucene.apache.org Subject: Re: Solr 4.3.1 - SolrCloud nodes down and lost documents Sorry, I should also mention that these leader nodes which are marked as down can actually still be queried locally with distrib=false with no problems. Is it possible that they've somehow got themselves out-of-sync? On 22 July 2013 13:37, Neil Prosser neil.pros...@gmail.com wrote: No need to apologise. It's always good to have things like that reiterated in case I've misunderstood along the way. I have a feeling that it's related to garbage collection. I assume that if the JVM heads into a stop-the-world GC Solr can't let ZooKeeper know it's still alive and so gets marked as down. I've just taken a look at the GC logs and can see a couple of full collections which took longer than my ZK timeout of 15s). I'm still in the process of tuning the cache sizes and have probably got it wrong (I'm coming from a Solr instance which runs on a 48G heap with ~40m documents and bringing it into five shards with 8G heap). I thought I was being conservative with the cache sizes but I should probably drop them right down and start again. The entire index is cached by Linux so I should just need caches to help with things which eat CPU at request time. The indexing level is unusual because normally we wouldn't be indexing everything sequentially, just making delta updates to the index as things are changed in our MoR. However, it's handy to know how it reacts under the most extreme load we could give it. In the case that I set my hard commit time to 15-30 seconds with openSearcher set to false, how do I control when I actually do invalidate the caches and open a new searcher? Is this something that Solr can do automatically, or will I need some sort of coordinator process to perform a 'proper' commit from outside Solr? In our case the process of opening a new searcher is definitely a hefty operation. We have a large number of boosts and filters which are used for just about every query that is made against the index so we currently have them warmed which can take upwards of a minute on our giant core. Thanks for your help. On 22 July 2013 13:00, Erick Erickson erickerick...@gmail.com wrote: Wow, you really shouldn't be having nodes go up and down so frequently, that's a big red flag. That said, SolrCloud should be pretty robust so this is something to pursue... But even a 5 minute hard commit can lead to a hefty transaction log under load, you may want to reduce it substantially depending on how fast you are sending docs to the index. I'm talking 15-30 seconds here. It's critical that openSearcher be set to false or you'll invalidate your caches that often. All a hard commit with openSearcher set to false does is close off the current segment and open a new one. It does NOT open/warm new searchers etc. The soft commits control visibility, so that's how you control whether you can search the docs or not. Pardon me if I'm repeating stuff you already know! As far as your nodes coming and going, I've seen some people have good results by upping the ZooKeeper timeout limit. So I guess my first question is whether the nodes are actually going out of service or whether it's just a timeout issue Good luck! Erick On Mon, Jul 22, 2013 at 3:29 AM, Neil Prosser neil.pros...@gmail.com wrote: Very true. I was impatient (I think less than three minutes impatient so hopefully 4.4 will save me from myself) but I didn't realise it was doing something rather than just hanging. Next time I have to restart a node I'll just leave and go get a cup of coffee or something. My configuration is set to auto hard-commit every 5 minutes. No auto soft-commit time is set. Over the course of the weekend, while left unattended the nodes have been going up and down (I've got to solve the issue that is causing them to come and go, but any suggestions on what is likely to be causing something like that are welcome), at one point one of the nodes stopped taking updates. After indexing properly for a few hours with that one shard not accepting updates, the replica of that shard which contains all the correct documents must have replicated from the broken node and dropped documents. Is there any protection against this in Solr or should I be focusing on getting my nodes to be more reliable? I've now got a situation where four of my five shards have leaders who are marked as down and followers who are up. I'm going to start grabbing information about
RE: Solr 4.3.1 - SolrCloud nodes down and lost documents
You should increase your ZK time out, this may be the issue in your case. You may also want to try the G1GC collector to keep STW under ZK time out. -Original message- From:Neil Prosser neil.pros...@gmail.com Sent: Monday 22nd July 2013 14:38 To: solr-user@lucene.apache.org Subject: Re: Solr 4.3.1 - SolrCloud nodes down and lost documents No need to apologise. It's always good to have things like that reiterated in case I've misunderstood along the way. I have a feeling that it's related to garbage collection. I assume that if the JVM heads into a stop-the-world GC Solr can't let ZooKeeper know it's still alive and so gets marked as down. I've just taken a look at the GC logs and can see a couple of full collections which took longer than my ZK timeout of 15s). I'm still in the process of tuning the cache sizes and have probably got it wrong (I'm coming from a Solr instance which runs on a 48G heap with ~40m documents and bringing it into five shards with 8G heap). I thought I was being conservative with the cache sizes but I should probably drop them right down and start again. The entire index is cached by Linux so I should just need caches to help with things which eat CPU at request time. The indexing level is unusual because normally we wouldn't be indexing everything sequentially, just making delta updates to the index as things are changed in our MoR. However, it's handy to know how it reacts under the most extreme load we could give it. In the case that I set my hard commit time to 15-30 seconds with openSearcher set to false, how do I control when I actually do invalidate the caches and open a new searcher? Is this something that Solr can do automatically, or will I need some sort of coordinator process to perform a 'proper' commit from outside Solr? In our case the process of opening a new searcher is definitely a hefty operation. We have a large number of boosts and filters which are used for just about every query that is made against the index so we currently have them warmed which can take upwards of a minute on our giant core. Thanks for your help. On 22 July 2013 13:00, Erick Erickson erickerick...@gmail.com wrote: Wow, you really shouldn't be having nodes go up and down so frequently, that's a big red flag. That said, SolrCloud should be pretty robust so this is something to pursue... But even a 5 minute hard commit can lead to a hefty transaction log under load, you may want to reduce it substantially depending on how fast you are sending docs to the index. I'm talking 15-30 seconds here. It's critical that openSearcher be set to false or you'll invalidate your caches that often. All a hard commit with openSearcher set to false does is close off the current segment and open a new one. It does NOT open/warm new searchers etc. The soft commits control visibility, so that's how you control whether you can search the docs or not. Pardon me if I'm repeating stuff you already know! As far as your nodes coming and going, I've seen some people have good results by upping the ZooKeeper timeout limit. So I guess my first question is whether the nodes are actually going out of service or whether it's just a timeout issue Good luck! Erick On Mon, Jul 22, 2013 at 3:29 AM, Neil Prosser neil.pros...@gmail.com wrote: Very true. I was impatient (I think less than three minutes impatient so hopefully 4.4 will save me from myself) but I didn't realise it was doing something rather than just hanging. Next time I have to restart a node I'll just leave and go get a cup of coffee or something. My configuration is set to auto hard-commit every 5 minutes. No auto soft-commit time is set. Over the course of the weekend, while left unattended the nodes have been going up and down (I've got to solve the issue that is causing them to come and go, but any suggestions on what is likely to be causing something like that are welcome), at one point one of the nodes stopped taking updates. After indexing properly for a few hours with that one shard not accepting updates, the replica of that shard which contains all the correct documents must have replicated from the broken node and dropped documents. Is there any protection against this in Solr or should I be focusing on getting my nodes to be more reliable? I've now got a situation where four of my five shards have leaders who are marked as down and followers who are up. I'm going to start grabbing information about the cluster state so I can track which changes are happening and in what order. I can get hold of Solr logs and garbage collection logs while these things are happening. Is this all just down to my nodes being unreliable? On 21 July 2013 13:52, Erick Erickson erickerick...@gmail.com wrote: Well, if I'm reading this right you had a node go out
Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
On 7/22/2013 6:45 AM, Markus Jelsma wrote: You should increase your ZK time out, this may be the issue in your case. You may also want to try the G1GC collector to keep STW under ZK time out. When I tried G1, the occasional stop-the-world GC actually got worse. I tried G1 after trying CMS with no other tuning parameters. The average GC time went down, but when it got into a place where it had to do a stop-the-world collection, it was worse. Based on the GC statistics in jvisualvm and jstat, I didn't think I had a problem. The way I discovered that I had a problem was by looking at my haproxy load balancer -- sometimes requests would be sent to a backup server instead of my primary, because the ping request handler was timing out on the LB health check. The LB was set to time out after five seconds. When I went looking deeper with the GC log and some other tools, I was seeing 8-10 second GC pauses. G1 was showing me pauses of 12 seconds. Now I use a heavily tuned CMS config, and there are no more LB switches to a backup server. I've put some of my own information about my GC settings on my personal Solr wiki page: http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning I've got an 8GB heap on my systems running 3.5.0 (one copy of the index) and a 6GB heap on those running 4.2.1 (the other copy of the index). Summary: Just switching to the G1 collector won't solve GC pause problems. There's not a lot of G1 tuning information out there yet. If someone can come up with a good set of G1 tuning parameters, G1 might become better than CMS. Thanks, Shawn
Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
A couple of things I've learned along the way ... I had a similar architecture where we used fairly low numbers for auto-commits with openSearcher=false. This keeps the tlog to a reasonable size. You'll need something on the client side to send in the hard commit request to open a new searcher every N docs or M minutes. Be careful with raising the Zk timeout as that also determines how quickly Zk can detect a node has crashed (afaik). In other words, it takes the zk client timeout seconds for Zk to consider an ephemeral znode as gone, so I caution you in increasing this value too much. The other thing to be aware of is this leaderVoteWait safety mechanism ... might see log messages that look like: 2013-06-24 18:12:40,408 [coreLoadExecutor-4-thread-1] INFO solr.cloud.ShardLeaderElectionContext - Waiting until we see more replicas up: total=2 found=1 timeoutin=139368 From Mark M: This is a safety mechanism - you can turn it off by configuring leaderVoteWait to 0 in solr.xml. This is meant to protect the case where you stop a shard or it fails and then the first node to get started back up has stale data - you don't want it to just become the leader. So we wait to see everyone we know about in the shard up to 3 or 5 min by default. Then we know all the shards participate in the leader election and the leader will end up with all updates it should have. You can lower that wait or turn it off with 0. NOTE: I tried setting it to 0 and my cluster went haywire, so consider just lowering it but not making it zero ;-) Max heap of 8GB seems overly large to me for 8M docs per shard esp. since you're using MMapDirectory to cache the primary data structures of your index in OS cache. I have run shards with 40M docs with 6GB max heap and chose to have more aggressive cache eviction by using a smallish LFU filter cache. This approach seems to spread the cost of GC out over time vs. massive amounts of clean-up when a new searcher is opened. With 8M docs, each cached filter will require about 1M of memory, so it seems like you could run with a smaller heap. I'm not a GC expert but found that having smaller heap and more aggressive cache evictions reduced full GC's (and how long they run for) on my Solr instances. On Mon, Jul 22, 2013 at 8:09 AM, Shawn Heisey s...@elyograg.org wrote: On 7/22/2013 6:45 AM, Markus Jelsma wrote: You should increase your ZK time out, this may be the issue in your case. You may also want to try the G1GC collector to keep STW under ZK time out. When I tried G1, the occasional stop-the-world GC actually got worse. I tried G1 after trying CMS with no other tuning parameters. The average GC time went down, but when it got into a place where it had to do a stop-the-world collection, it was worse. Based on the GC statistics in jvisualvm and jstat, I didn't think I had a problem. The way I discovered that I had a problem was by looking at my haproxy load balancer -- sometimes requests would be sent to a backup server instead of my primary, because the ping request handler was timing out on the LB health check. The LB was set to time out after five seconds. When I went looking deeper with the GC log and some other tools, I was seeing 8-10 second GC pauses. G1 was showing me pauses of 12 seconds. Now I use a heavily tuned CMS config, and there are no more LB switches to a backup server. I've put some of my own information about my GC settings on my personal Solr wiki page: http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning I've got an 8GB heap on my systems running 3.5.0 (one copy of the index) and a 6GB heap on those running 4.2.1 (the other copy of the index). Summary: Just switching to the G1 collector won't solve GC pause problems. There's not a lot of G1 tuning information out there yet. If someone can come up with a good set of G1 tuning parameters, G1 might become better than CMS. Thanks, Shawn
Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
Are you feeding Graphite from Solr? If so, how? On 07/19/2013 01:02 AM, Neil Prosser wrote: That was overnight so I was unable to track exactly what happened (I'm going off our Graphite graphs here).
Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
I just have a little python script which I run with cron (luckily that's the granularity we have in Graphite). It reads the same JSON the admin UI displays and dumps numeric values into Graphite. I can open source it if you like. I just need to make sure I remove any hacks/shortcuts that I've taken because I'm working with our cluster! On 22 July 2013 19:26, Lance Norskog goks...@gmail.com wrote: Are you feeding Graphite from Solr? If so, how? On 07/19/2013 01:02 AM, Neil Prosser wrote: That was overnight so I was unable to track exactly what happened (I'm going off our Graphite graphs here).
Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
Well, if I'm reading this right you had a node go out of circulation and then bounced nodes until that node became the leader. So of course it wouldn't have the documents (how could it?). Basically you shot yourself in the foot. Underlying here is why it took the machine you were re-starting so long to come up that you got impatient and started killing nodes. There has been quite a bit done to make that process better, so what version of Solr are you using? 4.4 is being voted on right now, so if you might want to consider upgrading. There was, for instance, a situation where it would take 3 minutes for machines to start up. How impatient were you? Also, what are your hard commit parameters? All of the documents you're indexing will be in the transaction log between hard commits, and when a node comes up the leader will replay everything in the tlog to the new node, which might be a source of why it took so long for the new node to come back up. At the very least the new node you were bringing back online will need to do a full index replication (old style) to get caught up. Best Erick On Fri, Jul 19, 2013 at 4:02 AM, Neil Prosser neil.pros...@gmail.com wrote: While indexing some documents to a SolrCloud cluster (10 machines, 5 shards and 2 replicas, so one replica on each machine) one of the replicas stopped receiving documents, while the other replica of the shard continued to grow. That was overnight so I was unable to track exactly what happened (I'm going off our Graphite graphs here). This morning when I was able to look at the cluster both replicas of that shard were marked as down (with one marked as leader). I attempted to restart the non-leader node but it took a long time to restart so I killed it and restarted the old leader, which also took a long time. I killed that one (I'm impatient) and left the non-leader node to restart, not realising it was missing approximately 700k documents that the old leader had. Eventually it restarted and became leader. I restarted the old leader and it dropped the number of documents it had to match the previous non-leader. Is this expected behaviour when a replica with fewer documents is started before the other and elected leader? Should I have been paying more attention to the number of documents on the server before restarting nodes? I am still in the process of tuning the caches and warming for these servers but we are putting some load through the cluster so it is possible that the nodes are having to work quite hard when a new version of the core comes is made available. Is this likely to explain why I occasionally see nodes dropping out? Unfortunately in restarting the nodes I lost the GC logs to see whether that was likely to be the culprit. Is this the sort of situation where you raise the ZooKeeper timeout a bit? Currently the timeout for all nodes is 15 seconds. Are there any known issues which might explain what's happening? I'm just getting started with SolrCloud after using standard master/slave replication for an index which has got too big for one machine over the last few months. Also, is there any particular information that would be helpful to help with these issues if it should happen again?