Re: Query time out. Solr node goes down.

2015-08-19 Thread Toke Eskildsen
On Tue, 2015-08-18 at 14:36 +0530, Modassar Ather wrote:
 So Toke/Daniel is the node showing *gone* on Solr cloud dashboard is
 because of GC pause and it is actually not gone but the ZK is not able to
 get the correct state?

That would be my guess.

 The issue is caused by a huge query with many wildcards and phrases in it.
 If you see I have mentioned about (*The request took too long to iterate
 over terms.). *So does it mean that the terms which are getting expanded
 has taken the amount of memory? Just trying to understand what consumes so
 much of memory.

If you can paste a problematic query, it is easier to see what is
happening.

- Toke Eskildsen, State and University Library, Denmark




Re: Query time out. Solr node goes down.

2015-08-18 Thread Modassar Ather
So Toke/Daniel is the node showing *gone* on Solr cloud dashboard is
because of GC pause and it is actually not gone but the ZK is not able to
get the correct state?
The issue is caused by a huge query with many wildcards and phrases in it.
If you see I have mentioned about (*The request took too long to iterate
over terms.). *So does it mean that the terms which are getting expanded
has taken the amount of memory? Just trying to understand what consumes so
much of memory.
I am trying to reproduce the OOM by executing multiple queries in parallel
but not able to whereas I am seeing the memory usage going up by more than
90+% for Solr JVM. So what happens to the query which is executed in
parallel. Do they wait for such query to timeout/complete which is taking
lot of time and resources?
We also have migration to java 8 on our things to do list and will try with
different GC settings.



On Tue, Aug 18, 2015 at 2:08 PM, Daniel Collins danwcoll...@gmail.com
wrote:

 Ah ok, its ZK timeout then
 (org.apache.zookeeper.KeeperException$SessionExpiredException)
 which is because of your GC pause.

 The page Shawn mentioned earlier has several links on how to investigate GC
 issues and some common GC settings, sounds like you need to tweak those.

 Generally speaking, I believe Java 8 is considered better for GC
 performance than 7, so you probably want to investigate that.  GC tuning is
 very dependent on the load on your system. You may be running close yo the
 limit under normal load, and that 1 big query is enough to tip it over the
 edge.  We have seen similar issues from time to time. We are still running
 an older Java 7 build with G1GC which we found worked well for us (though
 CMS seems to be the general consensus on the list here), migrating to Java
 8 is on our list of things to do, so our settings are probably not that
 relevant.


 On 18 August 2015 at 09:04, Toke Eskildsen t...@statsbiblioteket.dk wrote:

  On Tue, 2015-08-18 at 10:38 +0530, Modassar Ather wrote:
   Kindly help me understand, even if there is a a GC pause why the solr
  node
   will go down.
 
  If a stop-the-world GC is in progress, it is not possible for an
  external service to know if this is because a GC is in progress or the
  node is dead. If the GC takes longer than the relevant timeouts, the
  external conclusion is that it is dead.
 
  In you next post you state that there is very heavy GC going on, so it
  would seem that your main problem is that your heap is too small for
  your setup.
 
  Getting OOM for a 200GB index with 24GB heap is not at all impossible,
  but it is a bit of a red flag. If you have very high values for your
  caches or perform faceting on a lot of different fields, that might be
  the cause. If you describe your setup in more detail, we might be able
  to help find the cause for your relatively high heap requirement.
 
  - Toke Eskildsen, State and University Library, Denmark
 
 
 



Re: Query time out. Solr node goes down.

2015-08-18 Thread Modassar Ather
I tried to profile the memory of each solr node. I can see the GC activity
going higher as much as 98% and there are many instances where it has gone
up at 10+%. In one of the solr node I can see it going to 45%.
Memory is fully used and have gone to the maximum usage of heap which is
set to 24g. During other search I can see the error
*org.apache.solr.common.SolrException: no servers hosting shard.*
Few nodes are in gone state. There are many instances of
*org.apache.solr.common.SolrException:
org.apache.zookeeper.KeeperException$SessionExpiredException.*
GC logs shows a very busy garbage collection.Please provide your inputs.

On Tue, Aug 18, 2015 at 10:38 AM, Modassar Ather modather1...@gmail.com
wrote:

 Shawn! The container I am using is jetty only and the JVM setting I am
 using is the default one which comes with Solr startup scripts. Yes I have
 changed the JVM memory setting as mentioned.
 Kindly help me understand, even if there is a a GC pause why the solr node
 will go down. At least for other queries is should not throw exception of
 *org.apache.solr.common.SolrException: no servers hosting shard.*
 Why the node will throw above exception even a huge query is time out or
 may have taken lot of resources. Kindly help me understand in what
 conditions such exception can arise as I am not fully aware of it.

 Daniel! The error logs do not say if it was JVM crash or just solr. But by
 the exception I understand that it might have gone to a state from where it
 recovered after sometime. I did not restart the Solr.

 On Mon, Aug 17, 2015 at 10:12 PM, Daniel Collins danwcoll...@gmail.com
 wrote:

 When you say the solr node goes down, what do you mean by that? From
 your
 comment on the logs, you obviously lose the solr core at best (you do
 realize only having a single replica is inherently susceptible to failure,
 right?)
 But do you mean the Solr Core drops out of the collection (ZK timeout),
 the
 JVM stops, the whole machine crashes?

 On 17 August 2015 at 14:17, Shawn Heisey apa...@elyograg.org wrote:

  On 8/17/2015 5:45 AM, Modassar Ather wrote:
   The servers have 32g memory each. Solr JVM memory is set to -Xms20g
   -Xmx24g. There are no OOM in logs.
 
  Are you starting Solr 5.2.1 with the included start script, or have you
  installed it into another container?
 
  Assuming you're using the download's bin/solr script, that will
  normally set Xms and Xmx to the same value, so if you have overridden
  the memory settings such that you can have different values in Xms and
  Xmx, have you also overridden the garbage collection parameters?  If you
  have, what are they set to now?  You can see all arguments used on
  startup in the JVM section of the admin UI dashboard.
 
  If you've installed in an entirely different container, or you have
  overridden the garbage collection settings, then a 24GB heap might have
  extreme garbage collection pauses, lasting long enough to exceed the
  timeout.
 
  Giving 24 out of 32GB to Solr will mean that there is only (at most) 8GB
  left over for caching the index.  With 200GB of index, this is nowhere
  near enough, and is another likely source of Solr performance problems
  that cause timeouts.  This is what Upayavira was referring to in his
  reply.  For good performance with 200GB of index, you may need a lot
  more than 32GB of total RAM.
 
  https://wiki.apache.org/solr/SolrPerformanceProblems
 
  This wiki page also describes how you can use jconsole to judge how much
  heap you actually need.  24GB may be too much.
 
  Thanks,
  Shawn
 
 





Re: Query time out. Solr node goes down.

2015-08-18 Thread Toke Eskildsen
On Tue, 2015-08-18 at 10:38 +0530, Modassar Ather wrote:
 Kindly help me understand, even if there is a a GC pause why the solr node
 will go down.

If a stop-the-world GC is in progress, it is not possible for an
external service to know if this is because a GC is in progress or the
node is dead. If the GC takes longer than the relevant timeouts, the
external conclusion is that it is dead.

In you next post you state that there is very heavy GC going on, so it
would seem that your main problem is that your heap is too small for
your setup.

Getting OOM for a 200GB index with 24GB heap is not at all impossible,
but it is a bit of a red flag. If you have very high values for your
caches or perform faceting on a lot of different fields, that might be
the cause. If you describe your setup in more detail, we might be able
to help find the cause for your relatively high heap requirement.

- Toke Eskildsen, State and University Library, Denmark




Re: Query time out. Solr node goes down.

2015-08-18 Thread Daniel Collins
Ah ok, its ZK timeout then
(org.apache.zookeeper.KeeperException$SessionExpiredException)
which is because of your GC pause.

The page Shawn mentioned earlier has several links on how to investigate GC
issues and some common GC settings, sounds like you need to tweak those.

Generally speaking, I believe Java 8 is considered better for GC
performance than 7, so you probably want to investigate that.  GC tuning is
very dependent on the load on your system. You may be running close yo the
limit under normal load, and that 1 big query is enough to tip it over the
edge.  We have seen similar issues from time to time. We are still running
an older Java 7 build with G1GC which we found worked well for us (though
CMS seems to be the general consensus on the list here), migrating to Java
8 is on our list of things to do, so our settings are probably not that
relevant.


On 18 August 2015 at 09:04, Toke Eskildsen t...@statsbiblioteket.dk wrote:

 On Tue, 2015-08-18 at 10:38 +0530, Modassar Ather wrote:
  Kindly help me understand, even if there is a a GC pause why the solr
 node
  will go down.

 If a stop-the-world GC is in progress, it is not possible for an
 external service to know if this is because a GC is in progress or the
 node is dead. If the GC takes longer than the relevant timeouts, the
 external conclusion is that it is dead.

 In you next post you state that there is very heavy GC going on, so it
 would seem that your main problem is that your heap is too small for
 your setup.

 Getting OOM for a 200GB index with 24GB heap is not at all impossible,
 but it is a bit of a red flag. If you have very high values for your
 caches or perform faceting on a lot of different fields, that might be
 the cause. If you describe your setup in more detail, we might be able
 to help find the cause for your relatively high heap requirement.

 - Toke Eskildsen, State and University Library, Denmark





Re: Query time out. Solr node goes down.

2015-08-18 Thread Erick Erickson
bq: The issue is caused by a huge query with many wildcards and phrases in it.

Well, the very first thing I'd do is look at whether this is necessary.

For instance:
leading and trailing wildcards are an anti-pattern. You should investigate
using ngrams instead.

trailing wildcards usually resolve to term queries and are usually
quite space-efficieint

leading wildcards are usually best handled by ReverseWildcardFilterFactory.

Very often, large, complex wildcarded queries are a holdover from SQL searching
which is limited to things like %whatever% and don't take into account
things like
the Solr analysis chain. A classic example is people searching for run*
to find runner, running, runs etc., all of which can be handled by stemming.

FWIW,
Erick

On Tue, Aug 18, 2015 at 2:06 AM, Modassar Ather modather1...@gmail.com wrote:
 So Toke/Daniel is the node showing *gone* on Solr cloud dashboard is
 because of GC pause and it is actually not gone but the ZK is not able to
 get the correct state?
 The issue is caused by a huge query with many wildcards and phrases in it.
 If you see I have mentioned about (*The request took too long to iterate
 over terms.). *So does it mean that the terms which are getting expanded
 has taken the amount of memory? Just trying to understand what consumes so
 much of memory.
 I am trying to reproduce the OOM by executing multiple queries in parallel
 but not able to whereas I am seeing the memory usage going up by more than
 90+% for Solr JVM. So what happens to the query which is executed in
 parallel. Do they wait for such query to timeout/complete which is taking
 lot of time and resources?
 We also have migration to java 8 on our things to do list and will try with
 different GC settings.



 On Tue, Aug 18, 2015 at 2:08 PM, Daniel Collins danwcoll...@gmail.com
 wrote:

 Ah ok, its ZK timeout then
 (org.apache.zookeeper.KeeperException$SessionExpiredException)
 which is because of your GC pause.

 The page Shawn mentioned earlier has several links on how to investigate GC
 issues and some common GC settings, sounds like you need to tweak those.

 Generally speaking, I believe Java 8 is considered better for GC
 performance than 7, so you probably want to investigate that.  GC tuning is
 very dependent on the load on your system. You may be running close yo the
 limit under normal load, and that 1 big query is enough to tip it over the
 edge.  We have seen similar issues from time to time. We are still running
 an older Java 7 build with G1GC which we found worked well for us (though
 CMS seems to be the general consensus on the list here), migrating to Java
 8 is on our list of things to do, so our settings are probably not that
 relevant.


 On 18 August 2015 at 09:04, Toke Eskildsen t...@statsbiblioteket.dk wrote:

  On Tue, 2015-08-18 at 10:38 +0530, Modassar Ather wrote:
   Kindly help me understand, even if there is a a GC pause why the solr
  node
   will go down.
 
  If a stop-the-world GC is in progress, it is not possible for an
  external service to know if this is because a GC is in progress or the
  node is dead. If the GC takes longer than the relevant timeouts, the
  external conclusion is that it is dead.
 
  In you next post you state that there is very heavy GC going on, so it
  would seem that your main problem is that your heap is too small for
  your setup.
 
  Getting OOM for a 200GB index with 24GB heap is not at all impossible,
  but it is a bit of a red flag. If you have very high values for your
  caches or perform faceting on a lot of different fields, that might be
  the cause. If you describe your setup in more detail, we might be able
  to help find the cause for your relatively high heap requirement.
 
  - Toke Eskildsen, State and University Library, Denmark
 
 
 



Re: Query time out. Solr node goes down.

2015-08-17 Thread Upayavira
How much memory does each server have? How much of that memory is
assigned to the JVM? Is anything reported in the logs (e.g.
OutOfMemoryError)?

On Mon, Aug 17, 2015, at 12:29 PM, Modassar Ather wrote:
 Hi,
 
 I have a Solr cluster which hosts around 200 GB of index on each node and
 are 6 nodes. Solr version is 5.2.1.
 When a huge query is fired, it times out *(The request took too long to
 iterate over terms.)*, which I can see in the log but at same time the
 one
 of the Solr node goes down and the logs on the Solr nodes starts showing
 
 
 *following exception.org.apache.solr.common.SolrException: no servers
 hosting shard.*
 For sometime the shards are not responsive and other queries are not
 searched till the node(s) are back again. This is fine but what could be
 the possible cause of solr node going down.
 The other exception after the solr node goes down is leader election
 related which is not a concern as there is no replica of the nodes.
 
 Please provide your suggestions.
 
 Thanks,
 Modassar


Re: Query time out. Solr node goes down.

2015-08-17 Thread Modassar Ather
The servers have 32g memory each. Solr JVM memory is set to -Xms20g
-Xmx24g. There are no OOM in logs.

Regards,
Modassar

On Mon, Aug 17, 2015 at 5:06 PM, Upayavira u...@odoko.co.uk wrote:

 How much memory does each server have? How much of that memory is
 assigned to the JVM? Is anything reported in the logs (e.g.
 OutOfMemoryError)?

 On Mon, Aug 17, 2015, at 12:29 PM, Modassar Ather wrote:
  Hi,
 
  I have a Solr cluster which hosts around 200 GB of index on each node and
  are 6 nodes. Solr version is 5.2.1.
  When a huge query is fired, it times out *(The request took too long to
  iterate over terms.)*, which I can see in the log but at same time the
  one
  of the Solr node goes down and the logs on the Solr nodes starts showing
 
 
  *following exception.org.apache.solr.common.SolrException: no servers
  hosting shard.*
  For sometime the shards are not responsive and other queries are not
  searched till the node(s) are back again. This is fine but what could be
  the possible cause of solr node going down.
  The other exception after the solr node goes down is leader election
  related which is not a concern as there is no replica of the nodes.
 
  Please provide your suggestions.
 
  Thanks,
  Modassar



Re: Query time out. Solr node goes down.

2015-08-17 Thread Shawn Heisey
On 8/17/2015 5:45 AM, Modassar Ather wrote:
 The servers have 32g memory each. Solr JVM memory is set to -Xms20g
 -Xmx24g. There are no OOM in logs.

Are you starting Solr 5.2.1 with the included start script, or have you
installed it into another container?

Assuming you're using the download's bin/solr script, that will
normally set Xms and Xmx to the same value, so if you have overridden
the memory settings such that you can have different values in Xms and
Xmx, have you also overridden the garbage collection parameters?  If you
have, what are they set to now?  You can see all arguments used on
startup in the JVM section of the admin UI dashboard.

If you've installed in an entirely different container, or you have
overridden the garbage collection settings, then a 24GB heap might have
extreme garbage collection pauses, lasting long enough to exceed the
timeout.

Giving 24 out of 32GB to Solr will mean that there is only (at most) 8GB
left over for caching the index.  With 200GB of index, this is nowhere
near enough, and is another likely source of Solr performance problems
that cause timeouts.  This is what Upayavira was referring to in his
reply.  For good performance with 200GB of index, you may need a lot
more than 32GB of total RAM.

https://wiki.apache.org/solr/SolrPerformanceProblems

This wiki page also describes how you can use jconsole to judge how much
heap you actually need.  24GB may be too much.

Thanks,
Shawn



Re: Query time out. Solr node goes down.

2015-08-17 Thread Upayavira
Hoping that others will chime in here with other ideas. Have you,
though, tried reducing the JVM memory, leaving more available for the OS
disk cache? Having said that, I'd expect that to improve performance,
not to cause JVM crashes.

It might also help to know what version of Java you are running.

Upayavira

On Mon, Aug 17, 2015, at 12:45 PM, Modassar Ather wrote:
 The servers have 32g memory each. Solr JVM memory is set to -Xms20g
 -Xmx24g. There are no OOM in logs.
 
 Regards,
 Modassar
 
 On Mon, Aug 17, 2015 at 5:06 PM, Upayavira u...@odoko.co.uk wrote:
 
  How much memory does each server have? How much of that memory is
  assigned to the JVM? Is anything reported in the logs (e.g.
  OutOfMemoryError)?
 
  On Mon, Aug 17, 2015, at 12:29 PM, Modassar Ather wrote:
   Hi,
  
   I have a Solr cluster which hosts around 200 GB of index on each node and
   are 6 nodes. Solr version is 5.2.1.
   When a huge query is fired, it times out *(The request took too long to
   iterate over terms.)*, which I can see in the log but at same time the
   one
   of the Solr node goes down and the logs on the Solr nodes starts showing
  
  
   *following exception.org.apache.solr.common.SolrException: no servers
   hosting shard.*
   For sometime the shards are not responsive and other queries are not
   searched till the node(s) are back again. This is fine but what could be
   the possible cause of solr node going down.
   The other exception after the solr node goes down is leader election
   related which is not a concern as there is no replica of the nodes.
  
   Please provide your suggestions.
  
   Thanks,
   Modassar
 


Re: Query time out. Solr node goes down.

2015-08-17 Thread Modassar Ather
Thanks Upayavira fo your inputs. The java vesrion is 1.7.0_79.

On Mon, Aug 17, 2015 at 5:57 PM, Upayavira u...@odoko.co.uk wrote:

 Hoping that others will chime in here with other ideas. Have you,
 though, tried reducing the JVM memory, leaving more available for the OS
 disk cache? Having said that, I'd expect that to improve performance,
 not to cause JVM crashes.

 It might also help to know what version of Java you are running.

 Upayavira

 On Mon, Aug 17, 2015, at 12:45 PM, Modassar Ather wrote:
  The servers have 32g memory each. Solr JVM memory is set to -Xms20g
  -Xmx24g. There are no OOM in logs.
 
  Regards,
  Modassar
 
  On Mon, Aug 17, 2015 at 5:06 PM, Upayavira u...@odoko.co.uk wrote:
 
   How much memory does each server have? How much of that memory is
   assigned to the JVM? Is anything reported in the logs (e.g.
   OutOfMemoryError)?
  
   On Mon, Aug 17, 2015, at 12:29 PM, Modassar Ather wrote:
Hi,
   
I have a Solr cluster which hosts around 200 GB of index on each
 node and
are 6 nodes. Solr version is 5.2.1.
When a huge query is fired, it times out *(The request took too long
 to
iterate over terms.)*, which I can see in the log but at same time
 the
one
of the Solr node goes down and the logs on the Solr nodes starts
 showing
   
   
*following exception.org.apache.solr.common.SolrException: no servers
hosting shard.*
For sometime the shards are not responsive and other queries are not
searched till the node(s) are back again. This is fine but what
 could be
the possible cause of solr node going down.
The other exception after the solr node goes down is leader election
related which is not a concern as there is no replica of the nodes.
   
Please provide your suggestions.
   
Thanks,
Modassar
  



Re: Query time out. Solr node goes down.

2015-08-17 Thread Modassar Ather
Shawn! The container I am using is jetty only and the JVM setting I am
using is the default one which comes with Solr startup scripts. Yes I have
changed the JVM memory setting as mentioned.
Kindly help me understand, even if there is a a GC pause why the solr node
will go down. At least for other queries is should not throw exception of
*org.apache.solr.common.SolrException: no servers hosting shard.*
Why the node will throw above exception even a huge query is time out or
may have taken lot of resources. Kindly help me understand in what
conditions such exception can arise as I am not fully aware of it.

Daniel! The error logs do not say if it was JVM crash or just solr. But by
the exception I understand that it might have gone to a state from where it
recovered after sometime. I did not restart the Solr.

On Mon, Aug 17, 2015 at 10:12 PM, Daniel Collins danwcoll...@gmail.com
wrote:

 When you say the solr node goes down, what do you mean by that? From your
 comment on the logs, you obviously lose the solr core at best (you do
 realize only having a single replica is inherently susceptible to failure,
 right?)
 But do you mean the Solr Core drops out of the collection (ZK timeout), the
 JVM stops, the whole machine crashes?

 On 17 August 2015 at 14:17, Shawn Heisey apa...@elyograg.org wrote:

  On 8/17/2015 5:45 AM, Modassar Ather wrote:
   The servers have 32g memory each. Solr JVM memory is set to -Xms20g
   -Xmx24g. There are no OOM in logs.
 
  Are you starting Solr 5.2.1 with the included start script, or have you
  installed it into another container?
 
  Assuming you're using the download's bin/solr script, that will
  normally set Xms and Xmx to the same value, so if you have overridden
  the memory settings such that you can have different values in Xms and
  Xmx, have you also overridden the garbage collection parameters?  If you
  have, what are they set to now?  You can see all arguments used on
  startup in the JVM section of the admin UI dashboard.
 
  If you've installed in an entirely different container, or you have
  overridden the garbage collection settings, then a 24GB heap might have
  extreme garbage collection pauses, lasting long enough to exceed the
  timeout.
 
  Giving 24 out of 32GB to Solr will mean that there is only (at most) 8GB
  left over for caching the index.  With 200GB of index, this is nowhere
  near enough, and is another likely source of Solr performance problems
  that cause timeouts.  This is what Upayavira was referring to in his
  reply.  For good performance with 200GB of index, you may need a lot
  more than 32GB of total RAM.
 
  https://wiki.apache.org/solr/SolrPerformanceProblems
 
  This wiki page also describes how you can use jconsole to judge how much
  heap you actually need.  24GB may be too much.
 
  Thanks,
  Shawn
 
 



Re: Query time out. Solr node goes down.

2015-08-17 Thread Daniel Collins
When you say the solr node goes down, what do you mean by that? From your
comment on the logs, you obviously lose the solr core at best (you do
realize only having a single replica is inherently susceptible to failure,
right?)
But do you mean the Solr Core drops out of the collection (ZK timeout), the
JVM stops, the whole machine crashes?

On 17 August 2015 at 14:17, Shawn Heisey apa...@elyograg.org wrote:

 On 8/17/2015 5:45 AM, Modassar Ather wrote:
  The servers have 32g memory each. Solr JVM memory is set to -Xms20g
  -Xmx24g. There are no OOM in logs.

 Are you starting Solr 5.2.1 with the included start script, or have you
 installed it into another container?

 Assuming you're using the download's bin/solr script, that will
 normally set Xms and Xmx to the same value, so if you have overridden
 the memory settings such that you can have different values in Xms and
 Xmx, have you also overridden the garbage collection parameters?  If you
 have, what are they set to now?  You can see all arguments used on
 startup in the JVM section of the admin UI dashboard.

 If you've installed in an entirely different container, or you have
 overridden the garbage collection settings, then a 24GB heap might have
 extreme garbage collection pauses, lasting long enough to exceed the
 timeout.

 Giving 24 out of 32GB to Solr will mean that there is only (at most) 8GB
 left over for caching the index.  With 200GB of index, this is nowhere
 near enough, and is another likely source of Solr performance problems
 that cause timeouts.  This is what Upayavira was referring to in his
 reply.  For good performance with 200GB of index, you may need a lot
 more than 32GB of total RAM.

 https://wiki.apache.org/solr/SolrPerformanceProblems

 This wiki page also describes how you can use jconsole to judge how much
 heap you actually need.  24GB may be too much.

 Thanks,
 Shawn