Re: Why is SolrCloud doing a full copy of the index?
Hi Shawn, Thanks a lot for this entry! I'm wondering, when you say Garbage collections that happen more often than ten or so times per minute may be an indication that the heap size is too small, do you mean *any* collections, or just full collections? Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Sat, May 4, 2013 at 1:55 PM, Shawn Heisey s...@elyograg.org wrote: On 5/4/2013 11:45 AM, Shawn Heisey wrote: Advance warning: this is a long reply. I have condensed some relevant performance problem information into the following wiki page: http://wiki.apache.org/solr/SolrPerformanceProblems Anyone who has additional information for this page, feel free to add it. I hope I haven't made too many mistakes! Thanks, Shawn
Re: Why is SolrCloud doing a full copy of the index?
On 5/6/2013 1:39 PM, Michael Della Bitta wrote: Hi Shawn, Thanks a lot for this entry! I'm wondering, when you say Garbage collections that happen more often than ten or so times per minute may be an indication that the heap size is too small, do you mean *any* collections, or just full collections? My gut reaction is any collection, but in extremely busy environments a rate of ten per minute might be a very slow day on a setup that's working perfectly. As I wrote that particular bit, I was thinking that any number I put there was probably wrong for some large subset of users, but I wanted to finish putting down my thoughts and improve it later. Thanks, Shawn
Re: Why is SolrCloud doing a full copy of the index?
Hi, I just looked at SPM monitoring we have for Solr servers that run search-lucene.com. One of them has 1-2 collections/minute. Another one closer to 10. These are both small servers with small JVM heaps. Here is a graph of one of them: https://apps.sematext.com/spm/s/104ppwguao Just looked at some other Java servers we have running, not Solr, and I see close to 60 small collections per minute. So these numbers will vary a lot depending on the heap size and other JVM settings, as well as the actual code/usage. :) Otis -- Solr ElasticSearch Support http://sematext.com/ On Mon, May 6, 2013 at 4:39 PM, Shawn Heisey s...@elyograg.org wrote: On 5/6/2013 1:39 PM, Michael Della Bitta wrote: Hi Shawn, Thanks a lot for this entry! I'm wondering, when you say Garbage collections that happen more often than ten or so times per minute may be an indication that the heap size is too small, do you mean *any* collections, or just full collections? My gut reaction is any collection, but in extremely busy environments a rate of ten per minute might be a very slow day on a setup that's working perfectly. As I wrote that particular bit, I was thinking that any number I put there was probably wrong for some large subset of users, but I wanted to finish putting down my thoughts and improve it later. Thanks, Shawn
Re: Why is SolrCloud doing a full copy of the index?
Second the thanks Erick On Sat, May 4, 2013 at 6:08 PM, Lance Norskog goks...@gmail.com wrote: Great! Thank you very much Shawn. On 05/04/2013 10:55 AM, Shawn Heisey wrote: On 5/4/2013 11:45 AM, Shawn Heisey wrote: Advance warning: this is a long reply. I have condensed some relevant performance problem information into the following wiki page: http://wiki.apache.org/solr/SolrPerformanceProblems Anyone who has additional information for this page, feel free to add it. I hope I haven't made too many mistakes! Thanks, Shawn
Re: Why is SolrCloud doing a full copy of the index?
Thanks for the replies. It is really appreciated. Based on the replies it seems like upgrading to the latest version of Solr is something that will probably resolve this issue. We are also update quite frequently. We update every 5 minutes. We will try and set this to higher interval and see if that helps. We will also try increasing the servlet timeout and see if that resolves the issue. Among the other suggestions we already tried increasing the zkClientTimeout from 15 seconds to 30 seconds but that didn't seem to help. What do you recommend is a good value to try? Few more details about our system: we are running this on a system with 16GB of RAM. We are using 64 bit server and we also use SSD disks. Also, since we are already using 4.0 in our production environment with the aforementioned 3 servers setup, how should we go about upgrading to the latest version (4.3)? Do we need to do a full reindex of our data or is the index compatible between these versions? We will try out the suggestions and will post later if any of them help us resolve the issue. Again, thanks for the reply. -- View this message in context: http://lucene.472066.n3.nabble.com/Why-is-SolrCloud-doing-a-full-copy-of-the-index-tp4060800p4060897.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Why is SolrCloud doing a full copy of the index?
Advance warning: this is a long reply. Awesome Shawn. Thanks!
Re: Why is SolrCloud doing a full copy of the index?
Hmmm, there was a problem with replication where it would do a full copy unnecessarily that was fixed in 4.2 (I think). Frankly I don't quite know whether it was a problem with 4.0. It's also possible that your servlet containers have a short enough timeout that you're occasionally just getting connection timeouts, so lengthening that interval might be worthwhile, but that's a stab in the dark. Best Erick On Sat, May 4, 2013 at 4:06 AM, Kumar Limbu kumarli...@gmail.com wrote: We have Solr setup on 3 machines with only a single shard. We are using Solr 4.0 and currently have around 7 Million documents in our index. The size of our index is around 25 GB. We have a zookeeper ensemble of 3 zookeeper instances. Let's call the servers in our setup server (A), (B) and (C). All updates to Solr goes via server (C). Searches are performed on server (A) and (B). The updates are normally propagated incrementally from server (C) to the other 2 servers. Intermittently we have noted that the servers (A) and (B) makes a full copy of the index from server (C). This is not ideal because when this happens performance suffers. This occurs quite randomly and can occur on any of the other 2 nodes i.e. (A) and (B). On the server (C), which is the leader, we see errors like the following .We suspect this might be the reason why a full index copy occurs in the other nodes but we haven't been able to find out why this error is occurring. There is no connectivity issue with the servers. See the stacktrace below: SEVERE: shard update error StdNode: http://serverA/solr/rn0/:org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http:// serverA/solr/rn0 at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:335) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.http.NoHttpResponseException: The target server failed to respond at org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:101) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:252) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:282) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:247) at org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:216) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:298) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:647) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352) ... 11 more If anyone can help us troubleshoot this problem we will really appreciate the help. If there are any questions regarding our setup or further information regarding the error, please let me know. -- View this message in context: http://lucene.472066.n3.nabble.com/Why-is-SolrCloud-doing-a-full-copy-of-the-index-tp4060800.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Why is SolrCloud doing a full copy of the index?
On 5/4/2013 2:06 AM, Kumar Limbu wrote: We have Solr setup on 3 machines with only a single shard. We are using Solr 4.0 and currently have around 7 Million documents in our index. The size of our index is around 25 GB. We have a zookeeper ensemble of 3 zookeeper instances. Let's call the servers in our setup server (A), (B) and (C). All updates to Solr goes via server (C). Searches are performed on server (A) and (B). The updates are normally propagated incrementally from server (C) to the other 2 servers. Intermittently we have noted that the servers (A) and (B) makes a full copy of the index from server (C). This is not ideal because when this happens performance suffers. This occurs quite randomly and can occur on any of the other 2 nodes i.e. (A) and (B). On the server (C), which is the leader, we see errors like the following .We suspect this might be the reason why a full index copy occurs in the other nodes but we haven't been able to find out why this error is occurring. There is no connectivity issue with the servers. Advance warning: this is a long reply. The first thing that jumped out at me was the Solr version. Version 4.0 was brand new in October of last year. It's a senior citizen now. It has a lot of bugs, particularly in SolrCloud stability. I would recommend upgrading to at least 4.2.1. Version 4.3.0 (the fourth since 4.0) is quite literally about to be unveiled. It is already on a lot of download mirrors, the announcement is due any time now. Now for things to consider that don't involve upgrading, but might still be issues after upgrading: You might be able to make your system more stable by increasing your zkClientTimeout. A typical example value for this setting is 15 seconds. Next we will discuss why you might be exceeding the timeout: Slow operations, especially on commits, can be responsible for exceeding timeouts. One of the things you can do to decrease commit time is to lower the autowarmCount on your Solr caches. You can also decrease the frequency of your commits. A 25GB index is relatively large, and requires a lot of memory for proper operation. The reason it requires a lot of memory is because Solr is very reliant on the operating system disk cache, which uses free memory. http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html With a 25GB index, you want to have between 15 and 25GB of memory over and above the memory that your programs use. You would probably want to give the Java heap for Solr between 4 and 8GB. For a dedicated Solr server with your index, a really good amount of total system memory would be 32GB, with 24GB being a reasonable starting point. It should go without saying that you need a 64 bit server, a 64 bit operating system, and 64 bit Java for all this to work correctly. 32 bit software is not good at dealing with large amounts of memory, and 32 bit Java cannot have a heap size larger than 2GB. If you upgrade to 4.2.1 or later and reindex, your index size will drop due to compression of certain pieces. Those pieces don't normally affect minimum memory requirements very much, so your free memory requirement will still probably be at least 15GB. Unless you are using a commercial JVM with low-pause characteristics (like Zing), a heap of 4GB or larger can give you problems with stop-the-world GC pauses. A large heap is unfortunately required with a large index. The default collector that Java gives you is a *terrible* choice for large heaps in general and Solr in particular. Even changing to the CMS collector may not be enough - more tuning is required. Thanks, Shawn
Re: Why is SolrCloud doing a full copy of the index?
On 5/4/2013 11:45 AM, Shawn Heisey wrote: Advance warning: this is a long reply. I have condensed some relevant performance problem information into the following wiki page: http://wiki.apache.org/solr/SolrPerformanceProblems Anyone who has additional information for this page, feel free to add it. I hope I haven't made too many mistakes! Thanks, Shawn
Re: Why is SolrCloud doing a full copy of the index?
Great! Thank you very much Shawn. On 05/04/2013 10:55 AM, Shawn Heisey wrote: On 5/4/2013 11:45 AM, Shawn Heisey wrote: Advance warning: this is a long reply. I have condensed some relevant performance problem information into the following wiki page: http://wiki.apache.org/solr/SolrPerformanceProblems Anyone who has additional information for this page, feel free to add it. I hope I haven't made too many mistakes! Thanks, Shawn