Re: Query time out. Solr node goes down.
On Tue, 2015-08-18 at 14:36 +0530, Modassar Ather wrote: > So Toke/Daniel is the node showing *gone* on Solr cloud dashboard is > because of GC pause and it is actually not gone but the ZK is not able to > get the correct state? That would be my guess. > The issue is caused by a huge query with many wildcards and phrases in it. > If you see I have mentioned about (*The request took too long to iterate > over terms.). *So does it mean that the terms which are getting expanded > has taken the amount of memory? Just trying to understand what consumes so > much of memory. If you can paste a problematic query, it is easier to see what is happening. - Toke Eskildsen, State and University Library, Denmark
Re: Query time out. Solr node goes down.
bq: The issue is caused by a huge query with many wildcards and phrases in it. Well, the very first thing I'd do is look at whether this is necessary. For instance: leading and trailing wildcards are an anti-pattern. You should investigate using ngrams instead. trailing wildcards usually resolve to term queries and are usually quite space-efficieint leading wildcards are usually best handled by ReverseWildcardFilterFactory. Very often, large, complex wildcarded queries are a holdover from SQL searching which is limited to things like %whatever% and don't take into account things like the Solr analysis chain. A classic example is people searching for run* to find runner, running, runs etc., all of which can be handled by stemming. FWIW, Erick On Tue, Aug 18, 2015 at 2:06 AM, Modassar Ather wrote: > So Toke/Daniel is the node showing *gone* on Solr cloud dashboard is > because of GC pause and it is actually not gone but the ZK is not able to > get the correct state? > The issue is caused by a huge query with many wildcards and phrases in it. > If you see I have mentioned about (*The request took too long to iterate > over terms.). *So does it mean that the terms which are getting expanded > has taken the amount of memory? Just trying to understand what consumes so > much of memory. > I am trying to reproduce the OOM by executing multiple queries in parallel > but not able to whereas I am seeing the memory usage going up by more than > 90+% for Solr JVM. So what happens to the query which is executed in > parallel. Do they wait for such query to timeout/complete which is taking > lot of time and resources? > We also have migration to java 8 on our things to do list and will try with > different GC settings. > > > > On Tue, Aug 18, 2015 at 2:08 PM, Daniel Collins > wrote: > >> Ah ok, its ZK timeout then >> (org.apache.zookeeper.KeeperException$SessionExpiredException) >> which is because of your GC pause. >> >> The page Shawn mentioned earlier has several links on how to investigate GC >> issues and some common GC settings, sounds like you need to tweak those. >> >> Generally speaking, I believe Java 8 is considered better for GC >> performance than 7, so you probably want to investigate that. GC tuning is >> very dependent on the load on your system. You may be running close yo the >> limit under normal load, and that 1 big query is enough to tip it over the >> edge. We have seen similar issues from time to time. We are still running >> an older Java 7 build with G1GC which we found worked well for us (though >> CMS seems to be the general consensus on the list here), migrating to Java >> 8 is on our "list of things to do", so our settings are probably not that >> relevant. >> >> >> On 18 August 2015 at 09:04, Toke Eskildsen wrote: >> >> > On Tue, 2015-08-18 at 10:38 +0530, Modassar Ather wrote: >> > > Kindly help me understand, even if there is a a GC pause why the solr >> > node >> > > will go down. >> > >> > If a stop-the-world GC is in progress, it is not possible for an >> > external service to know if this is because a GC is in progress or the >> > node is dead. If the GC takes longer than the relevant timeouts, the >> > external conclusion is that it is dead. >> > >> > In you next post you state that there is very heavy GC going on, so it >> > would seem that your main problem is that your heap is too small for >> > your setup. >> > >> > Getting OOM for a 200GB index with 24GB heap is not at all impossible, >> > but it is a bit of a red flag. If you have very high values for your >> > caches or perform faceting on a lot of different fields, that might be >> > the cause. If you describe your setup in more detail, we might be able >> > to help find the cause for your relatively high heap requirement. >> > >> > - Toke Eskildsen, State and University Library, Denmark >> > >> > >> > >>
Re: Query time out. Solr node goes down.
So Toke/Daniel is the node showing *gone* on Solr cloud dashboard is because of GC pause and it is actually not gone but the ZK is not able to get the correct state? The issue is caused by a huge query with many wildcards and phrases in it. If you see I have mentioned about (*The request took too long to iterate over terms.). *So does it mean that the terms which are getting expanded has taken the amount of memory? Just trying to understand what consumes so much of memory. I am trying to reproduce the OOM by executing multiple queries in parallel but not able to whereas I am seeing the memory usage going up by more than 90+% for Solr JVM. So what happens to the query which is executed in parallel. Do they wait for such query to timeout/complete which is taking lot of time and resources? We also have migration to java 8 on our things to do list and will try with different GC settings. On Tue, Aug 18, 2015 at 2:08 PM, Daniel Collins wrote: > Ah ok, its ZK timeout then > (org.apache.zookeeper.KeeperException$SessionExpiredException) > which is because of your GC pause. > > The page Shawn mentioned earlier has several links on how to investigate GC > issues and some common GC settings, sounds like you need to tweak those. > > Generally speaking, I believe Java 8 is considered better for GC > performance than 7, so you probably want to investigate that. GC tuning is > very dependent on the load on your system. You may be running close yo the > limit under normal load, and that 1 big query is enough to tip it over the > edge. We have seen similar issues from time to time. We are still running > an older Java 7 build with G1GC which we found worked well for us (though > CMS seems to be the general consensus on the list here), migrating to Java > 8 is on our "list of things to do", so our settings are probably not that > relevant. > > > On 18 August 2015 at 09:04, Toke Eskildsen wrote: > > > On Tue, 2015-08-18 at 10:38 +0530, Modassar Ather wrote: > > > Kindly help me understand, even if there is a a GC pause why the solr > > node > > > will go down. > > > > If a stop-the-world GC is in progress, it is not possible for an > > external service to know if this is because a GC is in progress or the > > node is dead. If the GC takes longer than the relevant timeouts, the > > external conclusion is that it is dead. > > > > In you next post you state that there is very heavy GC going on, so it > > would seem that your main problem is that your heap is too small for > > your setup. > > > > Getting OOM for a 200GB index with 24GB heap is not at all impossible, > > but it is a bit of a red flag. If you have very high values for your > > caches or perform faceting on a lot of different fields, that might be > > the cause. If you describe your setup in more detail, we might be able > > to help find the cause for your relatively high heap requirement. > > > > - Toke Eskildsen, State and University Library, Denmark > > > > > > >
Re: Query time out. Solr node goes down.
Ah ok, its ZK timeout then (org.apache.zookeeper.KeeperException$SessionExpiredException) which is because of your GC pause. The page Shawn mentioned earlier has several links on how to investigate GC issues and some common GC settings, sounds like you need to tweak those. Generally speaking, I believe Java 8 is considered better for GC performance than 7, so you probably want to investigate that. GC tuning is very dependent on the load on your system. You may be running close yo the limit under normal load, and that 1 big query is enough to tip it over the edge. We have seen similar issues from time to time. We are still running an older Java 7 build with G1GC which we found worked well for us (though CMS seems to be the general consensus on the list here), migrating to Java 8 is on our "list of things to do", so our settings are probably not that relevant. On 18 August 2015 at 09:04, Toke Eskildsen wrote: > On Tue, 2015-08-18 at 10:38 +0530, Modassar Ather wrote: > > Kindly help me understand, even if there is a a GC pause why the solr > node > > will go down. > > If a stop-the-world GC is in progress, it is not possible for an > external service to know if this is because a GC is in progress or the > node is dead. If the GC takes longer than the relevant timeouts, the > external conclusion is that it is dead. > > In you next post you state that there is very heavy GC going on, so it > would seem that your main problem is that your heap is too small for > your setup. > > Getting OOM for a 200GB index with 24GB heap is not at all impossible, > but it is a bit of a red flag. If you have very high values for your > caches or perform faceting on a lot of different fields, that might be > the cause. If you describe your setup in more detail, we might be able > to help find the cause for your relatively high heap requirement. > > - Toke Eskildsen, State and University Library, Denmark > > >
Re: Query time out. Solr node goes down.
On Tue, 2015-08-18 at 10:38 +0530, Modassar Ather wrote: > Kindly help me understand, even if there is a a GC pause why the solr node > will go down. If a stop-the-world GC is in progress, it is not possible for an external service to know if this is because a GC is in progress or the node is dead. If the GC takes longer than the relevant timeouts, the external conclusion is that it is dead. In you next post you state that there is very heavy GC going on, so it would seem that your main problem is that your heap is too small for your setup. Getting OOM for a 200GB index with 24GB heap is not at all impossible, but it is a bit of a red flag. If you have very high values for your caches or perform faceting on a lot of different fields, that might be the cause. If you describe your setup in more detail, we might be able to help find the cause for your relatively high heap requirement. - Toke Eskildsen, State and University Library, Denmark
Re: Query time out. Solr node goes down.
I tried to profile the memory of each solr node. I can see the GC activity going higher as much as 98% and there are many instances where it has gone up at 10+%. In one of the solr node I can see it going to 45%. Memory is fully used and have gone to the maximum usage of heap which is set to 24g. During other search I can see the error *org.apache.solr.common.SolrException: no servers hosting shard.* Few nodes are in gone state. There are many instances of *org.apache.solr.common.SolrException: org.apache.zookeeper.KeeperException$SessionExpiredException.* GC logs shows a very busy garbage collection.Please provide your inputs. On Tue, Aug 18, 2015 at 10:38 AM, Modassar Ather wrote: > Shawn! The container I am using is jetty only and the JVM setting I am > using is the default one which comes with Solr startup scripts. Yes I have > changed the JVM memory setting as mentioned. > Kindly help me understand, even if there is a a GC pause why the solr node > will go down. At least for other queries is should not throw exception of > *org.apache.solr.common.SolrException: no servers hosting shard.* > Why the node will throw above exception even a huge query is time out or > may have taken lot of resources. Kindly help me understand in what > conditions such exception can arise as I am not fully aware of it. > > Daniel! The error logs do not say if it was JVM crash or just solr. But by > the exception I understand that it might have gone to a state from where it > recovered after sometime. I did not restart the Solr. > > On Mon, Aug 17, 2015 at 10:12 PM, Daniel Collins > wrote: > >> When you say "the solr node goes down", what do you mean by that? From >> your >> comment on the logs, you obviously lose the solr core at best (you do >> realize only having a single replica is inherently susceptible to failure, >> right?) >> But do you mean the Solr Core drops out of the collection (ZK timeout), >> the >> JVM stops, the whole machine crashes? >> >> On 17 August 2015 at 14:17, Shawn Heisey wrote: >> >> > On 8/17/2015 5:45 AM, Modassar Ather wrote: >> > > The servers have 32g memory each. Solr JVM memory is set to -Xms20g >> > > -Xmx24g. There are no OOM in logs. >> > >> > Are you starting Solr 5.2.1 with the included start script, or have you >> > installed it into another container? >> > >> > Assuming you're using the download's "bin/solr" script, that will >> > normally set Xms and Xmx to the same value, so if you have overridden >> > the memory settings such that you can have different values in Xms and >> > Xmx, have you also overridden the garbage collection parameters? If you >> > have, what are they set to now? You can see all arguments used on >> > startup in the "JVM" section of the admin UI dashboard. >> > >> > If you've installed in an entirely different container, or you have >> > overridden the garbage collection settings, then a 24GB heap might have >> > extreme garbage collection pauses, lasting long enough to exceed the >> > timeout. >> > >> > Giving 24 out of 32GB to Solr will mean that there is only (at most) 8GB >> > left over for caching the index. With 200GB of index, this is nowhere >> > near enough, and is another likely source of Solr performance problems >> > that cause timeouts. This is what Upayavira was referring to in his >> > reply. For good performance with 200GB of index, you may need a lot >> > more than 32GB of total RAM. >> > >> > https://wiki.apache.org/solr/SolrPerformanceProblems >> > >> > This wiki page also describes how you can use jconsole to judge how much >> > heap you actually need. 24GB may be too much. >> > >> > Thanks, >> > Shawn >> > >> > >> > >
Re: Query time out. Solr node goes down.
Shawn! The container I am using is jetty only and the JVM setting I am using is the default one which comes with Solr startup scripts. Yes I have changed the JVM memory setting as mentioned. Kindly help me understand, even if there is a a GC pause why the solr node will go down. At least for other queries is should not throw exception of *org.apache.solr.common.SolrException: no servers hosting shard.* Why the node will throw above exception even a huge query is time out or may have taken lot of resources. Kindly help me understand in what conditions such exception can arise as I am not fully aware of it. Daniel! The error logs do not say if it was JVM crash or just solr. But by the exception I understand that it might have gone to a state from where it recovered after sometime. I did not restart the Solr. On Mon, Aug 17, 2015 at 10:12 PM, Daniel Collins wrote: > When you say "the solr node goes down", what do you mean by that? From your > comment on the logs, you obviously lose the solr core at best (you do > realize only having a single replica is inherently susceptible to failure, > right?) > But do you mean the Solr Core drops out of the collection (ZK timeout), the > JVM stops, the whole machine crashes? > > On 17 August 2015 at 14:17, Shawn Heisey wrote: > > > On 8/17/2015 5:45 AM, Modassar Ather wrote: > > > The servers have 32g memory each. Solr JVM memory is set to -Xms20g > > > -Xmx24g. There are no OOM in logs. > > > > Are you starting Solr 5.2.1 with the included start script, or have you > > installed it into another container? > > > > Assuming you're using the download's "bin/solr" script, that will > > normally set Xms and Xmx to the same value, so if you have overridden > > the memory settings such that you can have different values in Xms and > > Xmx, have you also overridden the garbage collection parameters? If you > > have, what are they set to now? You can see all arguments used on > > startup in the "JVM" section of the admin UI dashboard. > > > > If you've installed in an entirely different container, or you have > > overridden the garbage collection settings, then a 24GB heap might have > > extreme garbage collection pauses, lasting long enough to exceed the > > timeout. > > > > Giving 24 out of 32GB to Solr will mean that there is only (at most) 8GB > > left over for caching the index. With 200GB of index, this is nowhere > > near enough, and is another likely source of Solr performance problems > > that cause timeouts. This is what Upayavira was referring to in his > > reply. For good performance with 200GB of index, you may need a lot > > more than 32GB of total RAM. > > > > https://wiki.apache.org/solr/SolrPerformanceProblems > > > > This wiki page also describes how you can use jconsole to judge how much > > heap you actually need. 24GB may be too much. > > > > Thanks, > > Shawn > > > > >
Re: Query time out. Solr node goes down.
When you say "the solr node goes down", what do you mean by that? From your comment on the logs, you obviously lose the solr core at best (you do realize only having a single replica is inherently susceptible to failure, right?) But do you mean the Solr Core drops out of the collection (ZK timeout), the JVM stops, the whole machine crashes? On 17 August 2015 at 14:17, Shawn Heisey wrote: > On 8/17/2015 5:45 AM, Modassar Ather wrote: > > The servers have 32g memory each. Solr JVM memory is set to -Xms20g > > -Xmx24g. There are no OOM in logs. > > Are you starting Solr 5.2.1 with the included start script, or have you > installed it into another container? > > Assuming you're using the download's "bin/solr" script, that will > normally set Xms and Xmx to the same value, so if you have overridden > the memory settings such that you can have different values in Xms and > Xmx, have you also overridden the garbage collection parameters? If you > have, what are they set to now? You can see all arguments used on > startup in the "JVM" section of the admin UI dashboard. > > If you've installed in an entirely different container, or you have > overridden the garbage collection settings, then a 24GB heap might have > extreme garbage collection pauses, lasting long enough to exceed the > timeout. > > Giving 24 out of 32GB to Solr will mean that there is only (at most) 8GB > left over for caching the index. With 200GB of index, this is nowhere > near enough, and is another likely source of Solr performance problems > that cause timeouts. This is what Upayavira was referring to in his > reply. For good performance with 200GB of index, you may need a lot > more than 32GB of total RAM. > > https://wiki.apache.org/solr/SolrPerformanceProblems > > This wiki page also describes how you can use jconsole to judge how much > heap you actually need. 24GB may be too much. > > Thanks, > Shawn > >
Re: Query time out. Solr node goes down.
On 8/17/2015 5:45 AM, Modassar Ather wrote: > The servers have 32g memory each. Solr JVM memory is set to -Xms20g > -Xmx24g. There are no OOM in logs. Are you starting Solr 5.2.1 with the included start script, or have you installed it into another container? Assuming you're using the download's "bin/solr" script, that will normally set Xms and Xmx to the same value, so if you have overridden the memory settings such that you can have different values in Xms and Xmx, have you also overridden the garbage collection parameters? If you have, what are they set to now? You can see all arguments used on startup in the "JVM" section of the admin UI dashboard. If you've installed in an entirely different container, or you have overridden the garbage collection settings, then a 24GB heap might have extreme garbage collection pauses, lasting long enough to exceed the timeout. Giving 24 out of 32GB to Solr will mean that there is only (at most) 8GB left over for caching the index. With 200GB of index, this is nowhere near enough, and is another likely source of Solr performance problems that cause timeouts. This is what Upayavira was referring to in his reply. For good performance with 200GB of index, you may need a lot more than 32GB of total RAM. https://wiki.apache.org/solr/SolrPerformanceProblems This wiki page also describes how you can use jconsole to judge how much heap you actually need. 24GB may be too much. Thanks, Shawn
Re: Query time out. Solr node goes down.
Thanks Upayavira fo your inputs. The java vesrion is 1.7.0_79. On Mon, Aug 17, 2015 at 5:57 PM, Upayavira wrote: > Hoping that others will chime in here with other ideas. Have you, > though, tried reducing the JVM memory, leaving more available for the OS > disk cache? Having said that, I'd expect that to improve performance, > not to cause JVM crashes. > > It might also help to know what version of Java you are running. > > Upayavira > > On Mon, Aug 17, 2015, at 12:45 PM, Modassar Ather wrote: > > The servers have 32g memory each. Solr JVM memory is set to -Xms20g > > -Xmx24g. There are no OOM in logs. > > > > Regards, > > Modassar > > > > On Mon, Aug 17, 2015 at 5:06 PM, Upayavira wrote: > > > > > How much memory does each server have? How much of that memory is > > > assigned to the JVM? Is anything reported in the logs (e.g. > > > OutOfMemoryError)? > > > > > > On Mon, Aug 17, 2015, at 12:29 PM, Modassar Ather wrote: > > > > Hi, > > > > > > > > I have a Solr cluster which hosts around 200 GB of index on each > node and > > > > are 6 nodes. Solr version is 5.2.1. > > > > When a huge query is fired, it times out *(The request took too long > to > > > > iterate over terms.)*, which I can see in the log but at same time > the > > > > one > > > > of the Solr node goes down and the logs on the Solr nodes starts > showing > > > > > > > > > > > > *following exception.org.apache.solr.common.SolrException: no servers > > > > hosting shard.* > > > > For sometime the shards are not responsive and other queries are not > > > > searched till the node(s) are back again. This is fine but what > could be > > > > the possible cause of solr node going down. > > > > The other exception after the solr node goes down is leader election > > > > related which is not a concern as there is no replica of the nodes. > > > > > > > > Please provide your suggestions. > > > > > > > > Thanks, > > > > Modassar > > > >
Re: Query time out. Solr node goes down.
Hoping that others will chime in here with other ideas. Have you, though, tried reducing the JVM memory, leaving more available for the OS disk cache? Having said that, I'd expect that to improve performance, not to cause JVM crashes. It might also help to know what version of Java you are running. Upayavira On Mon, Aug 17, 2015, at 12:45 PM, Modassar Ather wrote: > The servers have 32g memory each. Solr JVM memory is set to -Xms20g > -Xmx24g. There are no OOM in logs. > > Regards, > Modassar > > On Mon, Aug 17, 2015 at 5:06 PM, Upayavira wrote: > > > How much memory does each server have? How much of that memory is > > assigned to the JVM? Is anything reported in the logs (e.g. > > OutOfMemoryError)? > > > > On Mon, Aug 17, 2015, at 12:29 PM, Modassar Ather wrote: > > > Hi, > > > > > > I have a Solr cluster which hosts around 200 GB of index on each node and > > > are 6 nodes. Solr version is 5.2.1. > > > When a huge query is fired, it times out *(The request took too long to > > > iterate over terms.)*, which I can see in the log but at same time the > > > one > > > of the Solr node goes down and the logs on the Solr nodes starts showing > > > > > > > > > *following exception.org.apache.solr.common.SolrException: no servers > > > hosting shard.* > > > For sometime the shards are not responsive and other queries are not > > > searched till the node(s) are back again. This is fine but what could be > > > the possible cause of solr node going down. > > > The other exception after the solr node goes down is leader election > > > related which is not a concern as there is no replica of the nodes. > > > > > > Please provide your suggestions. > > > > > > Thanks, > > > Modassar > >
Re: Query time out. Solr node goes down.
The servers have 32g memory each. Solr JVM memory is set to -Xms20g -Xmx24g. There are no OOM in logs. Regards, Modassar On Mon, Aug 17, 2015 at 5:06 PM, Upayavira wrote: > How much memory does each server have? How much of that memory is > assigned to the JVM? Is anything reported in the logs (e.g. > OutOfMemoryError)? > > On Mon, Aug 17, 2015, at 12:29 PM, Modassar Ather wrote: > > Hi, > > > > I have a Solr cluster which hosts around 200 GB of index on each node and > > are 6 nodes. Solr version is 5.2.1. > > When a huge query is fired, it times out *(The request took too long to > > iterate over terms.)*, which I can see in the log but at same time the > > one > > of the Solr node goes down and the logs on the Solr nodes starts showing > > > > > > *following exception.org.apache.solr.common.SolrException: no servers > > hosting shard.* > > For sometime the shards are not responsive and other queries are not > > searched till the node(s) are back again. This is fine but what could be > > the possible cause of solr node going down. > > The other exception after the solr node goes down is leader election > > related which is not a concern as there is no replica of the nodes. > > > > Please provide your suggestions. > > > > Thanks, > > Modassar >
Re: Query time out. Solr node goes down.
How much memory does each server have? How much of that memory is assigned to the JVM? Is anything reported in the logs (e.g. OutOfMemoryError)? On Mon, Aug 17, 2015, at 12:29 PM, Modassar Ather wrote: > Hi, > > I have a Solr cluster which hosts around 200 GB of index on each node and > are 6 nodes. Solr version is 5.2.1. > When a huge query is fired, it times out *(The request took too long to > iterate over terms.)*, which I can see in the log but at same time the > one > of the Solr node goes down and the logs on the Solr nodes starts showing > > > *following exception.org.apache.solr.common.SolrException: no servers > hosting shard.* > For sometime the shards are not responsive and other queries are not > searched till the node(s) are back again. This is fine but what could be > the possible cause of solr node going down. > The other exception after the solr node goes down is leader election > related which is not a concern as there is no replica of the nodes. > > Please provide your suggestions. > > Thanks, > Modassar