Re: Memory leak in Solr

2016-12-07 Thread William Bell
What do you mean by JVM level? Run Solr on different ports on the same
machine? If you have a 32 core box would you run 2,3,4 JVMs?

On Sun, Dec 4, 2016 at 8:46 PM, Jeff Wartes  wrote:

>
> Here’s an earlier post where I mentioned some GC investigation tools:
> https://mail-archives.apache.org/mod_mbox/lucene-solr-user/
> 201604.mbox/%3c8f8fa32d-ec0e-4352-86f7-4b2d8a906...@whitepages.com%3E
>
> In my experience, there are many aspects of the Solr/Lucene memory
> allocation model that scale with things other than documents returned.
> (such as cardinality, or simply index size) A single query on a large index
> might consume dozens of megabytes of heap to complete. But that heap should
> also be released quickly after the query finishes.
> The key characteristic of a memory leak is that the software is allocating
> memory that it cannot reclaim. If it’s a leak, you ought to be able to
> reproduce it at any query rate - have you tried this? A run with, say, half
> the rate, over twice the duration?
>
> I’m inclined to agree with others here, that although you’ve correctly
> attributed the cause to GC, it’s probably less an indication of a leak, and
> more an indication of simply allocating memory faster than it can be
> reclaimed, combined with the long pauses that are increasingly unavoidable
> as heap size goes up.
> Note that in the case of a CMS allocation failure, the fallback full-GC is
> *single threaded*, which means it’ll usually take considerably longer than
> a normal GC - even for a comparable amount of garbage.
>
> In addition to GC tuning, you can address these by sharding more, both at
> the core and jvm level.
>
>
> On 12/4/16, 3:46 PM, "Shawn Heisey"  wrote:
>
> On 12/3/2016 9:46 PM, S G wrote:
> > The symptom we see is that the java clients querying Solr see
> response
> > times in 10s of seconds (not milliseconds).
> 
> > Some numbers for the Solr Cloud:
> >
> > *Overall infrastructure:*
> > - Only one collection
> > - 16 VMs used
> > - 8 shards (1 leader and 1 replica per shard - each core on separate
> VM)
> >
> > *Overview from one core:*
> > - Num Docs:193,623,388
> > - Max Doc:230,577,696
> > - Heap Memory Usage:231,217,880
> > - Deleted Docs:36,954,308
> > - Version:2,357,757
> > - Segment Count:37
>
> The heap memory usage number isn't useful.  It doesn't cover all the
> memory used.
>
> > *Stats from QueryHandler/select*
> > - requests:78,557
> > - errors:358
> > - timeouts:0
> > - totalTime:1,639,975.27
> > - avgRequestsPerSecond:2.62
> > - 5minRateReqsPerSecond:1.39
> > - 15minRateReqsPerSecond:1.64
> > - avgTimePerRequest:20.87
> > - medianRequestTime:0.70
> > - 75thPcRequestTime:1.11
> > - 95thPcRequestTime:191.76
>
> These times are in *milliseconds*, not seconds .. and these are even
> better numbers than you showed before.  Where are you seeing 10 plus
> second query times?  Solr is not showing numbers like that.
>
> If your VM host has 16 VMs on it and each one has a total memory size
> of
> 92GB, then if that machine doesn't have 1.5 terabytes of memory, you're
> oversubscribed, and this is going to lead to terrible performance...
> but
> the numbers you've shown here do not show terrible performance.
>
> > Plus, on every server, we are seeing lots of exceptions.
> > For example:
> >
> > Between 8:06:55 PM and 8:21:36 PM, exceptions are:
> >
> > 1) Request says it is coming from leader, but we are the leader:
> > update.distrib=FROMLEADER&distrib.from=HOSTB_ca_1_
> 1456430020/&wt=javabin&version=2
> >
> > 2) org.apache.solr.common.SolrException: Request says it is coming
> from
> > leader, but we are the leader
> >
> > 3) org.apache.solr.common.SolrException:
> > org.apache.solr.client.solrj.SolrServerException: Tried one server
> for read
> > operation and it timed out, so failing fast
> >
> > 4) null:org.apache.solr.common.SolrException:
> > org.apache.solr.client.solrj.SolrServerException: Tried one server
> for read
> > operation and it timed out, so failing fast
> >
> > 5) org.apache.solr.common.SolrException:
> > org.apache.solr.client.solrj.SolrServerException: Tried one server
> for read
> > operation and it timed out, so failing fast
> >
> > 6) null:org.apache.solr.common.SolrException:
> > org.apache.solr.client.solrj.SolrServerException: Tried one server
> for read
> > operation and it timed out, so failing fast
> >
> > 7) org.apache.solr.common.SolrException:
> > org.apache.solr.client.solrj.SolrServerException: No live
> SolrServers
> > available to handle this request. Zombie server list:
> > [HOSTA_ca_1_1456429897]
> >
> > 8) null:org.apache.solr.common.SolrException:
> > org.apache.solr.client.solrj.SolrServerException: No live
> SolrServers
> > available to handle t

Re: Memory leak in Solr

2016-12-04 Thread Jeff Wartes

Here’s an earlier post where I mentioned some GC investigation tools:
https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201604.mbox/%3c8f8fa32d-ec0e-4352-86f7-4b2d8a906...@whitepages.com%3E

In my experience, there are many aspects of the Solr/Lucene memory allocation 
model that scale with things other than documents returned. (such as 
cardinality, or simply index size) A single query on a large index might 
consume dozens of megabytes of heap to complete. But that heap should also be 
released quickly after the query finishes.
The key characteristic of a memory leak is that the software is allocating 
memory that it cannot reclaim. If it’s a leak, you ought to be able to 
reproduce it at any query rate - have you tried this? A run with, say, half the 
rate, over twice the duration?

I’m inclined to agree with others here, that although you’ve correctly 
attributed the cause to GC, it’s probably less an indication of a leak, and 
more an indication of simply allocating memory faster than it can be reclaimed, 
combined with the long pauses that are increasingly unavoidable as heap size 
goes up.
Note that in the case of a CMS allocation failure, the fallback full-GC is 
*single threaded*, which means it’ll usually take considerably longer than a 
normal GC - even for a comparable amount of garbage.

In addition to GC tuning, you can address these by sharding more, both at the 
core and jvm level.


On 12/4/16, 3:46 PM, "Shawn Heisey"  wrote:

On 12/3/2016 9:46 PM, S G wrote:
> The symptom we see is that the java clients querying Solr see response
> times in 10s of seconds (not milliseconds).

> Some numbers for the Solr Cloud:
>
> *Overall infrastructure:*
> - Only one collection
> - 16 VMs used
> - 8 shards (1 leader and 1 replica per shard - each core on separate VM)
>
> *Overview from one core:*
> - Num Docs:193,623,388
> - Max Doc:230,577,696
> - Heap Memory Usage:231,217,880
> - Deleted Docs:36,954,308
> - Version:2,357,757
> - Segment Count:37

The heap memory usage number isn't useful.  It doesn't cover all the
memory used.

> *Stats from QueryHandler/select*
> - requests:78,557
> - errors:358
> - timeouts:0
> - totalTime:1,639,975.27
> - avgRequestsPerSecond:2.62
> - 5minRateReqsPerSecond:1.39
> - 15minRateReqsPerSecond:1.64
> - avgTimePerRequest:20.87
> - medianRequestTime:0.70
> - 75thPcRequestTime:1.11
> - 95thPcRequestTime:191.76

These times are in *milliseconds*, not seconds .. and these are even
better numbers than you showed before.  Where are you seeing 10 plus
second query times?  Solr is not showing numbers like that.

If your VM host has 16 VMs on it and each one has a total memory size of
92GB, then if that machine doesn't have 1.5 terabytes of memory, you're
oversubscribed, and this is going to lead to terrible performance... but
the numbers you've shown here do not show terrible performance.

> Plus, on every server, we are seeing lots of exceptions.
> For example:
>
> Between 8:06:55 PM and 8:21:36 PM, exceptions are:
>
> 1) Request says it is coming from leader, but we are the leader:
> 
update.distrib=FROMLEADER&distrib.from=HOSTB_ca_1_1456430020/&wt=javabin&version=2
>
> 2) org.apache.solr.common.SolrException: Request says it is coming from
> leader, but we are the leader
>
> 3) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for 
read
> operation and it timed out, so failing fast
>
> 4) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for 
read
> operation and it timed out, so failing fast
>
> 5) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for 
read
> operation and it timed out, so failing fast
>
> 6) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for 
read
> operation and it timed out, so failing fast
>
> 7) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> available to handle this request. Zombie server list:
> [HOSTA_ca_1_1456429897]
>
> 8) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> available to handle this request. Zombie server list:
> [HOSTA_ca_1_1456429897]
>
> 9) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for 
read
> operation and it timed out, so failing fast
>
> 10) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one serve

Re: Memory leak in Solr

2016-12-04 Thread Shawn Heisey
On 12/3/2016 9:46 PM, S G wrote:
> The symptom we see is that the java clients querying Solr see response
> times in 10s of seconds (not milliseconds).

> Some numbers for the Solr Cloud:
>
> *Overall infrastructure:*
> - Only one collection
> - 16 VMs used
> - 8 shards (1 leader and 1 replica per shard - each core on separate VM)
>
> *Overview from one core:*
> - Num Docs:193,623,388
> - Max Doc:230,577,696
> - Heap Memory Usage:231,217,880
> - Deleted Docs:36,954,308
> - Version:2,357,757
> - Segment Count:37

The heap memory usage number isn't useful.  It doesn't cover all the
memory used.

> *Stats from QueryHandler/select*
> - requests:78,557
> - errors:358
> - timeouts:0
> - totalTime:1,639,975.27
> - avgRequestsPerSecond:2.62
> - 5minRateReqsPerSecond:1.39
> - 15minRateReqsPerSecond:1.64
> - avgTimePerRequest:20.87
> - medianRequestTime:0.70
> - 75thPcRequestTime:1.11
> - 95thPcRequestTime:191.76

These times are in *milliseconds*, not seconds .. and these are even
better numbers than you showed before.  Where are you seeing 10 plus
second query times?  Solr is not showing numbers like that.

If your VM host has 16 VMs on it and each one has a total memory size of
92GB, then if that machine doesn't have 1.5 terabytes of memory, you're
oversubscribed, and this is going to lead to terrible performance... but
the numbers you've shown here do not show terrible performance.

> Plus, on every server, we are seeing lots of exceptions.
> For example:
>
> Between 8:06:55 PM and 8:21:36 PM, exceptions are:
>
> 1) Request says it is coming from leader, but we are the leader:
> update.distrib=FROMLEADER&distrib.from=HOSTB_ca_1_1456430020/&wt=javabin&version=2
>
> 2) org.apache.solr.common.SolrException: Request says it is coming from
> leader, but we are the leader
>
> 3) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 4) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 5) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 6) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 7) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> available to handle this request. Zombie server list:
> [HOSTA_ca_1_1456429897]
>
> 8) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> available to handle this request. Zombie server list:
> [HOSTA_ca_1_1456429897]
>
> 9) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 10) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 11) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 12) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast

These errors sound like timeouts, possibly caused by long GC pauses ...
but as already mentioned, the query handler statistics do not indicate
long query times.  If a long GC were to happen during a query, then the
query time would be long as well.

The core information above doesn't include the size of the index on
disk.  That number would be useful for telling you whether there's
enough memory.

As I said at the beginning of the thread, I haven't seen anything here
to indicate a memory leak, and others are using version 4.10 without any
problems.  If there were a memory leak in a released version of Solr,
many people would have run into problems with it.

Thanks,
Shawn



Re: Memory leak in Solr

2016-12-04 Thread Walter Underwood
That is a huge heap.

Once you have enough heap memory to hold a Java program’s working set,
more memory doesn’t make it faster. I just makes the GC take longer.

If you have GC monitoring, look at how much memory is in use after a full GC.
Add the space for new generation (eden, whatever), then a bit more for 
burst memory usage. Set the heap to that.

I recommend fairly large new generation memory allocation. An HTTP service
has a fair amount of allocation that has a lifetime of one HTTP request. Those
allocations should never be promoted to tenured space.

We run with an 8G heap and a 2G new generation with 4.10.4.

Of course, make sure you are running some sort of parallel GC. You can use
G1 or use CMS with ParNew, your choice. We are running CMS/ParNew, but
will be experimenting with G1 soon.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Dec 4, 2016, at 11:07 AM, S G  wrote:
> 
> Thank you Eric.
> Our Solr version is 4.10 and we are not doing any sorting or faceting.
> 
> I am trying to find some ways of investigating this problem.
> Hence asking a few more questions to see what are the normal steps taken in
> such situations.
> (I did search a few of them on the Internet but could not find anything
> good).
> Any pointers provided here will help us resolve a little more quickly.
> 
> 
> 1) Is there a conclusive way to know about the memory leaks?
>  How does Solr ensure with each release that there are no memory leaks?
>  With a heap 24gb (-Xmx parameter), I sometimes see GC pauses of about 1
> second now.
>  Looks like we will need to scale it down.
>  Total VM memory is 92gb and Solr is the only process running on it.
> 
> 
> 2) How can I know that the zookeeper connectivity to Solr is not good?
>  What commands/steps are normally used to resolve this?
>  Does Solr has some metrics that share the zookeeper interaction
> statistics?
> 
> 
> 3) In a span of 9 hours, I see:
>  4 times: java.net.SocketException: Connection reset
>  32 times: java.net.SocketTimeoutException: Read timed out
> 
> And several other exceptions that ultimately bring a whole shard down
> (leader is recovery-failed and replica is down).
> 
> I understand that the above information might not be sufficient to get the
> full picture.
> But just in case, someone has resolved or debugged these issues before,
> please share your experience.
> It would be of great help to me.
> 
> Thanks,
> SG
> 
> 
> 
> 
> 
> On Sun, Dec 4, 2016 at 8:59 AM, Erick Erickson 
> wrote:
> 
>> All of this is consistent with not having a properly
>> tuned Solr instance wrt # documents, usage
>> pattern, memory allocated to the JVM, GC
>> settings and the like.
>> 
>> Your leader issues can be explained by long
>> GC pauses too. Zookeeper periodically pings
>> each replica it knows about and if the response
>> times out (due to GC in this case) then Zookeeper
>> thinks the node has gone away and marks
>> it as "down". Similarly when a leader forwards
>> an update to a follower and the request times
>> out, the leader will mark the follower as down.
>> Do this enough and the state of the cluster gets
>> "interesting".
>> 
>> You still haven't told us what version of Solr
>> you're using, the "Version" you took from
>> the core stats is the version of the _index_,
>> not Solr.
>> 
>> You have almost 200M documents on
>> a single core. That's definitely on the high side,
>> although I've seen that work. Assuming
>> you aren't doing things like faceting and
>> sorting and the like on non docValues fields.
>> 
>> As others have pointed out, the link you
>> provided doesn't provide much in the way of
>> any "smoking guns" as far as a memory
>> leak is concerned.
>> 
>> I've certainly seen situations where memory
>> required by Solr is close to the total memory
>> allocated to the JVM for instance. Then the GC
>> cycle kicks in and recovers just enough to
>> go on for a very brief time before going into another
>> GC cycle resulting in very poor performance.
>> 
>> So overall this looks like you need to do some
>> serious tuning of your Solr instances, take a
>> hard look at how you're using your physical
>> machines. You specify that these are VMs,
>> but how many VMs are you running per box?
>> How much JVM have you allocated for each?
>> How much total physical memory do you have
>> to work with per box?
>> 
>> Even if you provide the answers to the above
>> questions, there's not much we can do to
>> help you resolve your issues assuming it's
>> simply inappropriate sizing. I'd really recommend
>> you create a stress environment so you can
>> test different scenarios to become confident about
>> your expected performance, here's a blog on the
>> subject:
>> 
>> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-
>> the-abstract-why-we-dont-have-a-definitive-answer/
>> 
>> Best,
>> Erick
>> 
>> On Sat, Dec 3, 2016 at 8:46 PM, S G  wrote:
>>> The symptom we see is that the java clients querying Solr see r

Re: Memory leak in Solr

2016-12-04 Thread S G
Thank you Eric.
Our Solr version is 4.10 and we are not doing any sorting or faceting.

I am trying to find some ways of investigating this problem.
Hence asking a few more questions to see what are the normal steps taken in
such situations.
(I did search a few of them on the Internet but could not find anything
good).
Any pointers provided here will help us resolve a little more quickly.


1) Is there a conclusive way to know about the memory leaks?
  How does Solr ensure with each release that there are no memory leaks?
  With a heap 24gb (-Xmx parameter), I sometimes see GC pauses of about 1
second now.
  Looks like we will need to scale it down.
  Total VM memory is 92gb and Solr is the only process running on it.


2) How can I know that the zookeeper connectivity to Solr is not good?
  What commands/steps are normally used to resolve this?
  Does Solr has some metrics that share the zookeeper interaction
statistics?


3) In a span of 9 hours, I see:
  4 times: java.net.SocketException: Connection reset
  32 times: java.net.SocketTimeoutException: Read timed out

And several other exceptions that ultimately bring a whole shard down
(leader is recovery-failed and replica is down).

I understand that the above information might not be sufficient to get the
full picture.
But just in case, someone has resolved or debugged these issues before,
please share your experience.
It would be of great help to me.

Thanks,
SG





On Sun, Dec 4, 2016 at 8:59 AM, Erick Erickson 
wrote:

> All of this is consistent with not having a properly
> tuned Solr instance wrt # documents, usage
> pattern, memory allocated to the JVM, GC
> settings and the like.
>
> Your leader issues can be explained by long
> GC pauses too. Zookeeper periodically pings
> each replica it knows about and if the response
> times out (due to GC in this case) then Zookeeper
> thinks the node has gone away and marks
> it as "down". Similarly when a leader forwards
> an update to a follower and the request times
> out, the leader will mark the follower as down.
> Do this enough and the state of the cluster gets
> "interesting".
>
> You still haven't told us what version of Solr
> you're using, the "Version" you took from
> the core stats is the version of the _index_,
> not Solr.
>
> You have almost 200M documents on
> a single core. That's definitely on the high side,
> although I've seen that work. Assuming
> you aren't doing things like faceting and
> sorting and the like on non docValues fields.
>
> As others have pointed out, the link you
> provided doesn't provide much in the way of
> any "smoking guns" as far as a memory
> leak is concerned.
>
> I've certainly seen situations where memory
> required by Solr is close to the total memory
> allocated to the JVM for instance. Then the GC
> cycle kicks in and recovers just enough to
> go on for a very brief time before going into another
> GC cycle resulting in very poor performance.
>
> So overall this looks like you need to do some
> serious tuning of your Solr instances, take a
> hard look at how you're using your physical
> machines. You specify that these are VMs,
> but how many VMs are you running per box?
> How much JVM have you allocated for each?
> How much total physical memory do you have
> to work with per box?
>
> Even if you provide the answers to the above
> questions, there's not much we can do to
> help you resolve your issues assuming it's
> simply inappropriate sizing. I'd really recommend
> you create a stress environment so you can
> test different scenarios to become confident about
> your expected performance, here's a blog on the
> subject:
>
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-
> the-abstract-why-we-dont-have-a-definitive-answer/
>
> Best,
> Erick
>
> On Sat, Dec 3, 2016 at 8:46 PM, S G  wrote:
> > The symptom we see is that the java clients querying Solr see response
> > times in 10s of seconds (not milliseconds).
> > And on the tomcat's gc.log file (where Solr is running), we see very bad
> GC
> > pauses - threads being paused for 0.5 seconds per second approximately.
> >
> > Some numbers for the Solr Cloud:
> >
> > *Overall infrastructure:*
> > - Only one collection
> > - 16 VMs used
> > - 8 shards (1 leader and 1 replica per shard - each core on separate VM)
> >
> > *Overview from one core:*
> > - Num Docs:193,623,388
> > - Max Doc:230,577,696
> > - Heap Memory Usage:231,217,880
> > - Deleted Docs:36,954,308
> > - Version:2,357,757
> > - Segment Count:37
> >
> > *Stats from QueryHandler/select*
> > - requests:78,557
> > - errors:358
> > - timeouts:0
> > - totalTime:1,639,975.27
> > - avgRequestsPerSecond:2.62
> > - 5minRateReqsPerSecond:1.39
> > - 15minRateReqsPerSecond:1.64
> > - avgTimePerRequest:20.87
> > - medianRequestTime:0.70
> > - 75thPcRequestTime:1.11
> > - 95thPcRequestTime:191.76
> >
> > *Stats from QueryHandler/update*
> > - requests:33,555
> > - errors:0
> > - timeouts:0
> > - totalTime:227,870.58
> > - avgRequestsPerSecond:1.12
>

Re: Memory leak in Solr

2016-12-04 Thread Erick Erickson
All of this is consistent with not having a properly
tuned Solr instance wrt # documents, usage
pattern, memory allocated to the JVM, GC
settings and the like.

Your leader issues can be explained by long
GC pauses too. Zookeeper periodically pings
each replica it knows about and if the response
times out (due to GC in this case) then Zookeeper
thinks the node has gone away and marks
it as "down". Similarly when a leader forwards
an update to a follower and the request times
out, the leader will mark the follower as down.
Do this enough and the state of the cluster gets
"interesting".

You still haven't told us what version of Solr
you're using, the "Version" you took from
the core stats is the version of the _index_,
not Solr.

You have almost 200M documents on
a single core. That's definitely on the high side,
although I've seen that work. Assuming
you aren't doing things like faceting and
sorting and the like on non docValues fields.

As others have pointed out, the link you
provided doesn't provide much in the way of
any "smoking guns" as far as a memory
leak is concerned.

I've certainly seen situations where memory
required by Solr is close to the total memory
allocated to the JVM for instance. Then the GC
cycle kicks in and recovers just enough to
go on for a very brief time before going into another
GC cycle resulting in very poor performance.

So overall this looks like you need to do some
serious tuning of your Solr instances, take a
hard look at how you're using your physical
machines. You specify that these are VMs,
but how many VMs are you running per box?
How much JVM have you allocated for each?
How much total physical memory do you have
to work with per box?

Even if you provide the answers to the above
questions, there's not much we can do to
help you resolve your issues assuming it's
simply inappropriate sizing. I'd really recommend
you create a stress environment so you can
test different scenarios to become confident about
your expected performance, here's a blog on the
subject:

https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Best,
Erick

On Sat, Dec 3, 2016 at 8:46 PM, S G  wrote:
> The symptom we see is that the java clients querying Solr see response
> times in 10s of seconds (not milliseconds).
> And on the tomcat's gc.log file (where Solr is running), we see very bad GC
> pauses - threads being paused for 0.5 seconds per second approximately.
>
> Some numbers for the Solr Cloud:
>
> *Overall infrastructure:*
> - Only one collection
> - 16 VMs used
> - 8 shards (1 leader and 1 replica per shard - each core on separate VM)
>
> *Overview from one core:*
> - Num Docs:193,623,388
> - Max Doc:230,577,696
> - Heap Memory Usage:231,217,880
> - Deleted Docs:36,954,308
> - Version:2,357,757
> - Segment Count:37
>
> *Stats from QueryHandler/select*
> - requests:78,557
> - errors:358
> - timeouts:0
> - totalTime:1,639,975.27
> - avgRequestsPerSecond:2.62
> - 5minRateReqsPerSecond:1.39
> - 15minRateReqsPerSecond:1.64
> - avgTimePerRequest:20.87
> - medianRequestTime:0.70
> - 75thPcRequestTime:1.11
> - 95thPcRequestTime:191.76
>
> *Stats from QueryHandler/update*
> - requests:33,555
> - errors:0
> - timeouts:0
> - totalTime:227,870.58
> - avgRequestsPerSecond:1.12
> - 5minRateReqsPerSecond:1.16
> - 15minRateReqsPerSecond:1.23
> - avgTimePerRequest:6.79
> - medianRequestTime:3.16
> - 75thPcRequestTime:5.27
> - 95thPcRequestTime:9.33
>
> And yet the Solr clients are reporting timeouts and very long read times.
>
> Plus, on every server, we are seeing lots of exceptions.
> For example:
>
> Between 8:06:55 PM and 8:21:36 PM, exceptions are:
>
> 1) Request says it is coming from leader, but we are the leader:
> update.distrib=FROMLEADER&distrib.from=HOSTB_ca_1_1456430020/&wt=javabin&version=2
>
> 2) org.apache.solr.common.SolrException: Request says it is coming from
> leader, but we are the leader
>
> 3) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 4) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 5) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 6) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 7) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> available to handle this request. Zombie server list:
> [HOSTA_ca_1_1456429897]
>
> 8) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> available to handle this request. Zombie server lis

Re: Memory leak in Solr

2016-12-03 Thread S G
The symptom we see is that the java clients querying Solr see response
times in 10s of seconds (not milliseconds).
And on the tomcat's gc.log file (where Solr is running), we see very bad GC
pauses - threads being paused for 0.5 seconds per second approximately.

Some numbers for the Solr Cloud:

*Overall infrastructure:*
- Only one collection
- 16 VMs used
- 8 shards (1 leader and 1 replica per shard - each core on separate VM)

*Overview from one core:*
- Num Docs:193,623,388
- Max Doc:230,577,696
- Heap Memory Usage:231,217,880
- Deleted Docs:36,954,308
- Version:2,357,757
- Segment Count:37

*Stats from QueryHandler/select*
- requests:78,557
- errors:358
- timeouts:0
- totalTime:1,639,975.27
- avgRequestsPerSecond:2.62
- 5minRateReqsPerSecond:1.39
- 15minRateReqsPerSecond:1.64
- avgTimePerRequest:20.87
- medianRequestTime:0.70
- 75thPcRequestTime:1.11
- 95thPcRequestTime:191.76

*Stats from QueryHandler/update*
- requests:33,555
- errors:0
- timeouts:0
- totalTime:227,870.58
- avgRequestsPerSecond:1.12
- 5minRateReqsPerSecond:1.16
- 15minRateReqsPerSecond:1.23
- avgTimePerRequest:6.79
- medianRequestTime:3.16
- 75thPcRequestTime:5.27
- 95thPcRequestTime:9.33

And yet the Solr clients are reporting timeouts and very long read times.

Plus, on every server, we are seeing lots of exceptions.
For example:

Between 8:06:55 PM and 8:21:36 PM, exceptions are:

1) Request says it is coming from leader, but we are the leader:
update.distrib=FROMLEADER&distrib.from=HOSTB_ca_1_1456430020/&wt=javabin&version=2

2) org.apache.solr.common.SolrException: Request says it is coming from
leader, but we are the leader

3) org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Tried one server for read
operation and it timed out, so failing fast

4) null:org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Tried one server for read
operation and it timed out, so failing fast

5) org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Tried one server for read
operation and it timed out, so failing fast

6) null:org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Tried one server for read
operation and it timed out, so failing fast

7) org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers
available to handle this request. Zombie server list:
[HOSTA_ca_1_1456429897]

8) null:org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers
available to handle this request. Zombie server list:
[HOSTA_ca_1_1456429897]

9) org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Tried one server for read
operation and it timed out, so failing fast

10) null:org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Tried one server for read
operation and it timed out, so failing fast

11) org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Tried one server for read
operation and it timed out, so failing fast

12) null:org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Tried one server for read
operation and it timed out, so failing fast

Why are we seeing so many timeouts then and why so huge response times on
the client?

Thanks
SG



On Sat, Dec 3, 2016 at 4:19 PM,  wrote:

> What tool is that ? The stats I would like to run on my Solr instance
>
> Bill Bell
> Sent from mobile
>
>
> > On Dec 2, 2016, at 4:49 PM, Shawn Heisey  wrote:
> >
> >> On 12/2/2016 12:01 PM, S G wrote:
> >> This post shows some stats on Solr which indicate that there might be a
> >> memory leak in there.
> >>
> >> http://stackoverflow.com/questions/40939166/is-this-a-
> memory-leak-in-solr
> >>
> >> Can someone please help to debug this?
> >> It might be a very good step in making Solr stable if we can fix this.
> >
> > +1 to what Walter said.
> >
> > I replied earlier on the stackoverflow question.
> >
> > FYI -- your 95th percentile request time of about 16 milliseconds is NOT
> > something that I would characterize as "very high."  I would *love* to
> > have statistics that good.
> >
> > Even your 99th percentile request time is not much more than a full
> > second.  If a search takes a couple of seconds, most users will not
> > really care, and some might not even notice.  It's when a large
> > percentage of queries start taking several seconds that complaints start
> > coming in.  On your system, 99 percent of your queries are completing in
> > 1.3 seconds or less, and 95 percent of them are less than 17
> > milliseconds.  That sounds quite good to me.
> >
> > In my experience, the time it takes for the browser to receive the
> > search result page and render it is a significant part of the total time
> > to see results, and often dwarfs the time spent getting info from Solr.
> >
> > Here'

Re: Memory leak in Solr

2016-12-03 Thread billnbell
What tool is that ? The stats I would like to run on my Solr instance 

Bill Bell
Sent from mobile


> On Dec 2, 2016, at 4:49 PM, Shawn Heisey  wrote:
> 
>> On 12/2/2016 12:01 PM, S G wrote:
>> This post shows some stats on Solr which indicate that there might be a
>> memory leak in there.
>> 
>> http://stackoverflow.com/questions/40939166/is-this-a-memory-leak-in-solr
>> 
>> Can someone please help to debug this?
>> It might be a very good step in making Solr stable if we can fix this.
> 
> +1 to what Walter said.
> 
> I replied earlier on the stackoverflow question.
> 
> FYI -- your 95th percentile request time of about 16 milliseconds is NOT
> something that I would characterize as "very high."  I would *love* to
> have statistics that good.
> 
> Even your 99th percentile request time is not much more than a full
> second.  If a search takes a couple of seconds, most users will not
> really care, and some might not even notice.  It's when a large
> percentage of queries start taking several seconds that complaints start
> coming in.  On your system, 99 percent of your queries are completing in
> 1.3 seconds or less, and 95 percent of them are less than 17
> milliseconds.  That sounds quite good to me.
> 
> In my experience, the time it takes for the browser to receive the
> search result page and render it is a significant part of the total time
> to see results, and often dwarfs the time spent getting info from Solr.
> 
> Here's some numbers from Solr in my organization:
> 
> requests:   4102054
> errors: 364894
> timeouts:   49
> totalTime:  799446287.45041
> avgRequestsPerSecond:   1.2375565828793849
> 5minRateReqsPerSecond:  0.8444329508327961
> 15minRateReqsPerSecond: 0.8631197328073346
> avgTimePerRequest:  194.88926460997587
> medianRequestTime:  20.8566605
> 75thPcRequestTime:  85.5132884999
> 95thPcRequestTime:  2202.27746654
> 99thPcRequestTime:  5280.375381280002
> 999thPcRequestTime: 6866.020122961001
> 
> The numbers above come from a distributed index that contains 167
> million documents and takes up about 200GB of disk space across two
> machines.
> 
> requests:   192683
> errors: 124
> timeouts:   0
> totalTime:  199380421.985073
> avgRequestsPerSecond0.04722771354554
> 5minRateReqsPerSecon0.00800545427600684
> 15minRateReqsPerSecond: 0.017521222412364163
> avgTimePerRequest:  1034.7587591280653
> medianRequestTime:  541.591858
> 75thPcRequestTime:  1683.83246125
> 95thPcRequestTime:  5644.542019949997
> 99thPcRequestTime:  9445.592394760004
> 999thPcRequestTime: 14602.166640771007
> 
> These numbers are from an index with about 394 million documents, taking
> up nearly 500GB of disk space.  This index is also distributed on
> multiple machines.
> 
> Are you experiencing any problems other than what you perceive as slow
> queries?  I asked some other questions on stackoverflow.  In particular,
> I'd like to know the total memory on the server, the total number of
> documents (maxDoc and numDoc) you're handling with this server, as well
> as the total index size.  What do your queries look like?  What version
> and vendor of Java are you using?  Can you share your config/schema?
> 
> A memory leak is very unlikely, unless your Java or your operating
> system is broken.  I can't say for sure that it's not happening, but
> it's just not something we see around here.
> 
> Here's what I have collected on performance issues in Solr.  This page
> does mostly concern itself with memory, though it touches briefly on
> other topics:
> 
> https://wiki.apache.org/solr/SolrPerformanceProblems
> 
> Thanks,
> Shawn
> 


Re: Memory leak in Solr

2016-12-03 Thread Greg Harris
Hi,

All your stats show is large memory requirements to Solr. There is no
direct mapping of number of documents and queries to memory reqts as
requested in that article. Different Solr projects can yield extremely,
extremely different requirements. If you want to understand your memory
usage better, you need to do a heap dump and to analyze it with something
like Eclipse MemoryAnalyzer or YourKit. Its STW, so you will have a little
bit of downtime. In 4.10 I'd almost already guess that your culprit is not
using docValues for things being faceted, grouped, sorted on leaving you
with a large fieldCache and yielding large memory requirements which will
not be cleaned upon a gc as they are still "live objects". While I couldn't
say that's true for sure without more analysis, its IME, pretty common.

Greg


On Dec 2, 2016 11:01 AM, "S G"  wrote:

Hi,

This post shows some stats on Solr which indicate that there might be a
memory leak in there.

http://stackoverflow.com/questions/40939166/is-this-a-memory-leak-in-solr

Can someone please help to debug this?
It might be a very good step in making Solr stable if we can fix this.

Thanks
SG


Re: Memory leak in Solr

2016-12-02 Thread Shawn Heisey
On 12/2/2016 12:01 PM, S G wrote:
> This post shows some stats on Solr which indicate that there might be a
> memory leak in there.
>
> http://stackoverflow.com/questions/40939166/is-this-a-memory-leak-in-solr
>
> Can someone please help to debug this?
> It might be a very good step in making Solr stable if we can fix this.

+1 to what Walter said.

I replied earlier on the stackoverflow question.

FYI -- your 95th percentile request time of about 16 milliseconds is NOT
something that I would characterize as "very high."  I would *love* to
have statistics that good.

Even your 99th percentile request time is not much more than a full
second.  If a search takes a couple of seconds, most users will not
really care, and some might not even notice.  It's when a large
percentage of queries start taking several seconds that complaints start
coming in.  On your system, 99 percent of your queries are completing in
1.3 seconds or less, and 95 percent of them are less than 17
milliseconds.  That sounds quite good to me.

In my experience, the time it takes for the browser to receive the
search result page and render it is a significant part of the total time
to see results, and often dwarfs the time spent getting info from Solr.

Here's some numbers from Solr in my organization:

requests:   4102054
errors: 364894
timeouts:   49
totalTime:  799446287.45041
avgRequestsPerSecond:   1.2375565828793849
5minRateReqsPerSecond:  0.8444329508327961
15minRateReqsPerSecond: 0.8631197328073346
avgTimePerRequest:  194.88926460997587
medianRequestTime:  20.8566605
75thPcRequestTime:  85.5132884999
95thPcRequestTime:  2202.27746654
99thPcRequestTime:  5280.375381280002
999thPcRequestTime: 6866.020122961001

The numbers above come from a distributed index that contains 167
million documents and takes up about 200GB of disk space across two
machines.

requests:   192683
errors: 124
timeouts:   0
totalTime:  199380421.985073
avgRequestsPerSecond0.04722771354554
5minRateReqsPerSecon0.00800545427600684
15minRateReqsPerSecond: 0.017521222412364163
avgTimePerRequest:  1034.7587591280653
medianRequestTime:  541.591858
75thPcRequestTime:  1683.83246125
95thPcRequestTime:  5644.542019949997
99thPcRequestTime:  9445.592394760004
999thPcRequestTime: 14602.166640771007

These numbers are from an index with about 394 million documents, taking
up nearly 500GB of disk space.  This index is also distributed on
multiple machines.

Are you experiencing any problems other than what you perceive as slow
queries?  I asked some other questions on stackoverflow.  In particular,
I'd like to know the total memory on the server, the total number of
documents (maxDoc and numDoc) you're handling with this server, as well
as the total index size.  What do your queries look like?  What version
and vendor of Java are you using?  Can you share your config/schema?

A memory leak is very unlikely, unless your Java or your operating
system is broken.  I can't say for sure that it's not happening, but
it's just not something we see around here.

Here's what I have collected on performance issues in Solr.  This page
does mostly concern itself with memory, though it touches briefly on
other topics:

https://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn



Re: Memory leak in Solr

2016-12-02 Thread Scott Blum
Are you sure it's an actual leak, not just memory pinned by caches?

Related: https://issues.apache.org/jira/browse/SOLR-9810

On Fri, Dec 2, 2016 at 2:01 PM, S G  wrote:

> Hi,
>
> This post shows some stats on Solr which indicate that there might be a
> memory leak in there.
>
> http://stackoverflow.com/questions/40939166/is-this-a-memory-leak-in-solr
>
> Can someone please help to debug this?
> It might be a very good step in making Solr stable if we can fix this.
>
> Thanks
> SG
>


Re: Memory leak in Solr

2016-12-02 Thread Walter Underwood
We’ve been running Solr 4.10.4 in prod for a couple of years. There aren’t any 
obvious
memory leaks in it. It stays up for months.

Objects ejected from the cache will almost always be tenured, so that tends to 
cause 
full GCs.

If there are very few repeats in your query load, you’ll see a lot of cache 
ejections. 
This can also happen if you have an HTTP cache in front of the Solr hosts.
What are the hit rates on the Solr caches?

Also, are you using “NOW” in your queries? That will cause a very low hit rate
on the query result cache.

We can’t help without a lot more information, like your search architecture, 
the 
search collections, the query load, cache sizes, etc.

Finally, this is not a question for the dev list. This belongs on solr-user, so 
I’m
dropping the reply to the dev list.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Dec 2, 2016, at 11:01 AM, S G  wrote:
> 
> Hi,
> 
> This post shows some stats on Solr which indicate that there might be a 
> memory leak in there.
> 
> http://stackoverflow.com/questions/40939166/is-this-a-memory-leak-in-solr 
> 
> 
> Can someone please help to debug this?
> It might be a very good step in making Solr stable if we can fix this.
> 
> Thanks
> SG



Re: Memory Leak in solr 4.8.1

2015-04-09 Thread pras.venkatesh
I don't have a filter cache, and have completely disabled filter cache. Since
I am not using filter queries.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Memory-Leak-in-solr-4-8-1-tp4198488p4198716.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Memory Leak in solr 4.8.1

2015-04-08 Thread Toke Eskildsen
On Wed, 2015-04-08 at 14:00 -0700, pras.venkatesh wrote:
> 1. 8 nodes, 4 shards(2 nodes per shard)
> 2. each node having about 55 GB of Data, in total there is 450 million
> documents in the collection. so the document size is not huge, 

So ~120M docs/shard.

> 3. The schema has 42 fields, it gets reloaded every 15 mins with about
> 50,000 documents. Now we have primary Key for the index, so when there are
> any duplicates the document gets re-written.
> 4. The GC policy is CMS, with heap size min and max = 8 gb and perm size =
> 512 mb and RAM on the VM is 24 gb.

Do you have a large and active filter cache? Each entry is 30MB, so it
does not take many entries to fill a 8GB heap. That would match the
description of ever-running GC.

- Toke Eskildsen, State and University Library, Denmark




Re: Memory Leak in Solr?

2011-12-06 Thread Samarendra Pratap
Hi, one of problem is now alleviated.

 Number of lines with "can't identify protocol " in "lsof" output is now
reduced very much. Earlier it kept increasing upto "ulimit -n" thus causing
"Too many open files" error but now it is contained to a quite lesser
number. This happened after I changed maxIdleTime from 10s to 50s in
jetty.xml.

*...*
*5*
*...*



But my original problem of **heavy swap usage** is still not clear. If I
could find a solution or work-around I'll post here. In the mean time if
someone knows the reason or is interested in helping me find the reason
please reply.

Thanks



On Sun, Dec 4, 2011 at 3:28 PM, Samarendra Pratap wrote:

> Hi Chris,
>  Thanks for you reply and sorry for delay. Please find my replies below in
> the mail.
>
> On Sat, Dec 3, 2011 at 5:56 AM, Chris Hostetter 
> wrote:
>
>>
>> : Till 3 days ago, we were running Solr 3.4 instance with following java
>> : command line options
>> : java -server -*Xms2048m* -*Xmx4096m* -Dsolr.solr.home=etc -jar start.jar
>> :
>> : Then we increased the memory with following options and restarted the
>> : server
>> : java -server *-**Xms4096m* -*Xmx10g* -Dsolr.solr.home=etc -jar start.jar
>>...
>> : Since we restarted Solr, the memory usage of application is continuously
>> : increasing. The swap usage goes from almost zero to as high as 4GB in
>> every
>> : 6-8 hours. We kept restarting the Solr to push it down to ~zero but the
>> : same memory usage trend kept repeating itself.
>>
>> do you really mean "swap" in that sentence, or do you mean the amount of
>> memory your OS says java is using?  You said you have 16GB total
>> physical ram, how big is the index itself? do you have any other processes
>> running on that machine?  (You should ideally leave at least enough ram
>> free to let the OS/filesystem cache the index in RAM)
>>
>> Yes, by "swap" i mean "swap". Which we can see by "free -m" on linux and
> many other ways. So it is not the memory for java.
> The index size is around 31G.
> We have this machine dedicated for Solr, so no other significant processes
> are run here, except incremental indexing script. I didn't think about
> filesystem cache in RAM earlier, but since we have 16G ram so in my opinion
> that should be enough.
>
> Since you've not only changed the Xmx (max heap size) param but also the
>> Xms param (min heap size) to 4GB, it doesn't seem out of the ordinary
>> at all for the memory usage to jump up to 4GB quickly.  If the JVM did
>> exactly what the docs say it should, then on startup it would
>> *immediatley* allocated 4GB or ram, but i think in practice it allocates
>> as needed, but doesn't do any garbage collection if the memory used is
>> still below the "Xms" value.
>>
>> : Then finally I reverted the least expected change, the command line
>> memory
>> : options, back to min 2g, max 4g and I was surprised to see that the
>> problem
>> : vanished.
>> : java -server *-Xms2g* *-Xmx4g* -Dsolr.solr.home=etc -jar start.jar
>> :
>> : Is this a memory leak or my lack of understanding of java/linux memory
>> : allocation?
>>
>> I think you're just missunderstanding the allocation ... if you tell java
>> to use at leaast 4GB, it's going to use at least 4GB w/o blinking.
>>
>> I accept I wrote the confusing word "min" for -Xms, but I promise I
> really I know its meaning. :-)
>
>  did you try "-Xms2g -Xmx10g" ?
>>
>> (again: don't set Xmx any higher then you actually have the RAM to
>> support given the filesystem cache and any other stuff you have running,
>> but you can increase mx w/o increasing ms if you are just worried about
>> how fast the heap grows on startup ... not sure why that would be
>> worrisome though
>>
> As I've written in the mail above that I really meant "swap", I am not
> really concerned about heap size at startup.
>
>
>>
>> -Hoss
>>
>
> My concern is that when a single machine was able to serve n1+n2 queries
> earlier with -Xms2g -Xmx4g
> why the same machine is not able to serve n2 queries with -Xms4g -Xmx10g?
>
> In fact I tried other combinations as well 2g-6g, 1g-6g, 2g-10g but
> nothing replicated the issue.
>
> Since yesterday I am able to see another issue in the same machine. I saw
> "Too many open files" error in the log thus creating problem in incremental
> indexing.
>
> A lot of lines of the lsof were like following -
> java 1232 solr   52u sock0,51805813279
> can't identify protocol
> java 1232 solr   53u sock0,51805813282
> can't identify protocol
> java 1232 solr   54u sock0,51805813283
> can't identify protocol
>
> I searched for "can't identify protocol" and my case seemed related to a
> bug http://bugs.sun.com/view_bug.do?bug_id=6745052 but my java version
> ("1.6.0_22") did not match in the bug description.
>
> I am not sure if this problem and the memory problem could be related. I
> did not check the lsof earlier. Could this be a reason of memory le

Re: Memory Leak in Solr?

2011-12-04 Thread Samarendra Pratap
Hi Chris,
 Thanks for you reply and sorry for delay. Please find my replies below in
the mail.

On Sat, Dec 3, 2011 at 5:56 AM, Chris Hostetter wrote:

>
> : Till 3 days ago, we were running Solr 3.4 instance with following java
> : command line options
> : java -server -*Xms2048m* -*Xmx4096m* -Dsolr.solr.home=etc -jar start.jar
> :
> : Then we increased the memory with following options and restarted the
> : server
> : java -server *-**Xms4096m* -*Xmx10g* -Dsolr.solr.home=etc -jar start.jar
>...
> : Since we restarted Solr, the memory usage of application is continuously
> : increasing. The swap usage goes from almost zero to as high as 4GB in
> every
> : 6-8 hours. We kept restarting the Solr to push it down to ~zero but the
> : same memory usage trend kept repeating itself.
>
> do you really mean "swap" in that sentence, or do you mean the amount of
> memory your OS says java is using?  You said you have 16GB total
> physical ram, how big is the index itself? do you have any other processes
> running on that machine?  (You should ideally leave at least enough ram
> free to let the OS/filesystem cache the index in RAM)
>
> Yes, by "swap" i mean "swap". Which we can see by "free -m" on linux and
many other ways. So it is not the memory for java.
The index size is around 31G.
We have this machine dedicated for Solr, so no other significant processes
are run here, except incremental indexing script. I didn't think about
filesystem cache in RAM earlier, but since we have 16G ram so in my opinion
that should be enough.

Since you've not only changed the Xmx (max heap size) param but also the
> Xms param (min heap size) to 4GB, it doesn't seem out of the ordinary
> at all for the memory usage to jump up to 4GB quickly.  If the JVM did
> exactly what the docs say it should, then on startup it would
> *immediatley* allocated 4GB or ram, but i think in practice it allocates
> as needed, but doesn't do any garbage collection if the memory used is
> still below the "Xms" value.
>
> : Then finally I reverted the least expected change, the command line
> memory
> : options, back to min 2g, max 4g and I was surprised to see that the
> problem
> : vanished.
> : java -server *-Xms2g* *-Xmx4g* -Dsolr.solr.home=etc -jar start.jar
> :
> : Is this a memory leak or my lack of understanding of java/linux memory
> : allocation?
>
> I think you're just missunderstanding the allocation ... if you tell java
> to use at leaast 4GB, it's going to use at least 4GB w/o blinking.
>
> I accept I wrote the confusing word "min" for -Xms, but I promise I really
I know its meaning. :-)

 did you try "-Xms2g -Xmx10g" ?
>
> (again: don't set Xmx any higher then you actually have the RAM to
> support given the filesystem cache and any other stuff you have running,
> but you can increase mx w/o increasing ms if you are just worried about
> how fast the heap grows on startup ... not sure why that would be
> worrisome though
>
As I've written in the mail above that I really meant "swap", I am not
really concerned about heap size at startup.


>
> -Hoss
>

My concern is that when a single machine was able to serve n1+n2 queries
earlier with -Xms2g -Xmx4g
why the same machine is not able to serve n2 queries with -Xms4g -Xmx10g?

In fact I tried other combinations as well 2g-6g, 1g-6g, 2g-10g but nothing
replicated the issue.

Since yesterday I am able to see another issue in the same machine. I saw
"Too many open files" error in the log thus creating problem in incremental
indexing.

A lot of lines of the lsof were like following -
java 1232 solr   52u sock0,51805813279
can't identify protocol
java 1232 solr   53u sock0,51805813282
can't identify protocol
java 1232 solr   54u sock0,51805813283
can't identify protocol

I searched for "can't identify protocol" and my case seemed related to a
bug http://bugs.sun.com/view_bug.do?bug_id=6745052 but my java version
("1.6.0_22") did not match in the bug description.

I am not sure if this problem and the memory problem could be related. I
did not check the lsof earlier. Could this be a reason of memory leak?

-- 
Regards,
Samar


Re: Memory Leak in Solr?

2011-12-02 Thread Chris Hostetter

: Till 3 days ago, we were running Solr 3.4 instance with following java
: command line options
: java -server -*Xms2048m* -*Xmx4096m* -Dsolr.solr.home=etc -jar start.jar
: 
: Then we increased the memory with following options and restarted the
: server
: java -server *-**Xms4096m* -*Xmx10g* -Dsolr.solr.home=etc -jar start.jar
...
: Since we restarted Solr, the memory usage of application is continuously
: increasing. The swap usage goes from almost zero to as high as 4GB in every
: 6-8 hours. We kept restarting the Solr to push it down to ~zero but the
: same memory usage trend kept repeating itself.

do you really mean "swap" in that sentence, or do you mean the amount of 
memory your OS says java is using?  You said you have 16GB total 
physical ram, how big is the index itself? do you have any other processes 
running on that machine?  (You should ideally leave at least enough ram 
free to let the OS/filesystem cache the index in RAM)

Since you've not only changed the Xmx (max heap size) param but also the 
Xms param (min heap size) to 4GB, it doesn't seem out of the ordinary 
at all for the memory usage to jump up to 4GB quickly.  If the JVM did 
exactly what the docs say it should, then on startup it would 
*immediatley* allocated 4GB or ram, but i think in practice it allocates 
as needed, but doesn't do any garbage collection if the memory used is 
still below the "Xms" value.

: Then finally I reverted the least expected change, the command line memory
: options, back to min 2g, max 4g and I was surprised to see that the problem
: vanished.
: java -server *-Xms2g* *-Xmx4g* -Dsolr.solr.home=etc -jar start.jar
: 
: Is this a memory leak or my lack of understanding of java/linux memory
: allocation?

I think you're just missunderstanding the allocation ... if you tell java 
to use at leaast 4GB, it's going to use at least 4GB w/o blinking.

did you try "-Xms2g -Xmx10g" ?

(again: don't set Xmx any higher then you actually have the RAM to 
support given the filesystem cache and any other stuff you have running, 
but you can increase mx w/o increasing ms if you are just worried about 
how fast the heap grows on startup ... not sure why that would be 
worrisome though


-Hoss