RE: Solr Cloud with 5 servers cluster failed due to Leader out of memory

2016-08-09 Thread Tim Chen
Guys, (@Erick & @Shawn),

Thanks for the great suggestions!

I have increased Tomcat MaxThreads from 200 to 1 on our staging 
environment. So far so good.

I will perform some more indexing test and see how it goes.

Many thanks,
Tim

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Monday, 8 August 2016 11:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Cloud with 5 servers cluster failed due to Leader out of 
memory

On 8/7/2016 6:53 PM, Tim Chen wrote:
> Exception in thread "http-bio-8983-exec-6571" java.lang.OutOfMemoryError: 
> unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:714)
> at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
> at 
> java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1017)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at 
> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
> at java.lang.Thread.run(Thread.java:745)

I find myself chasing Erick once again. :)  Supplementing what he told you:

There are two things that might be happening here.

1) The Tomcat setting "maxThreads" may limiting the number of threads.
This defaults to 200, and should be increased to 1.  The specific error 
doesn't sound like an application limit, though -- it acts more like Java 
itself can't create the thread.  If you have already adjusted maxThreads, then 
it's more likely to be the second option:

2) The operating system may be imposing a limit on the number of 
processes/threads a user is allowed to start.  On Linux systems, this is 
typically 1024.  For other operating systems, I am not sure what the default 
limit is.

Thanks,
Shawn



[Premiere League Starts Saturday 13 August 9.30pm on 
SBS]


Re: Solr Cloud with 5 servers cluster failed due to Leader out of memory

2016-08-09 Thread Shawn Heisey
On 8/8/2016 11:09 AM, Ritesh Kumar (Avanade) wrote:
> This is great but where can I do this change in SOLR 6 as I have
> implemented CDCR.

In Solr 6, the chance of using Tomcat will be near zero, and the
maxThreads setting in Solr's Jetty config should already be set to 1.

If you're seeing this same OOME (can't create a new thread) in Solr 6,
then the problem is most likely going to be at the operating system
level.  Exactly how to increase the number of processes/threads that
Solr can create will vary depending on the operating system you're
running.  For help, consult documentation or support resources for your
OS, or maybe Google.

If you're seeing a different problem, then please send a brand new
message to the list detailing your problem.

http://people.apache.org/~hossman/#threadhijack

Thanks,
Shawn



RE: Solr Cloud with 5 servers cluster failed due to Leader out of memory

2016-08-08 Thread Ritesh Kumar (Avanade)
This is great but where can I do this change in SOLR 6 as I have implemented 
CDCR.

Ritesh K
Infrastructure Sr. Engineer – Jericho Team
Sales & Marketing Digital Services
t +91-7799936921   v-kur...@microsoft.com

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 08 August 2016 21:30
To: solr-user 
Subject: Re: Solr Cloud with 5 servers cluster failed due to Leader out of 
memory

Yeah, Shawn, but you, like, know something about Tomcat and actually provide 
useful advice ;)

On Mon, Aug 8, 2016 at 6:44 AM, Shawn Heisey  wrote:
> On 8/7/2016 6:53 PM, Tim Chen wrote:
>> Exception in thread "http-bio-8983-exec-6571" java.lang.OutOfMemoryError: 
>> unable to create new native thread
>> at java.lang.Thread.start0(Native Method)
>> at java.lang.Thread.start(Thread.java:714)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1017)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at 
>> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
>> at java.lang.Thread.run(Thread.java:745)
>
> I find myself chasing Erick once again. :)  Supplementing what he told you:
>
> There are two things that might be happening here.
>
> 1) The Tomcat setting "maxThreads" may limiting the number of threads.
> This defaults to 200, and should be increased to 1.  The specific 
> error doesn't sound like an application limit, though -- it acts more 
> like Java itself can't create the thread.  If you have already 
> adjusted maxThreads, then it's more likely to be the second option:
>
> 2) The operating system may be imposing a limit on the number of 
> processes/threads a user is allowed to start.  On Linux systems, this 
> is typically 1024.  For other operating systems, I am not sure what 
> the default limit is.
>
> Thanks,
> Shawn
>


Re: Solr Cloud with 5 servers cluster failed due to Leader out of memory

2016-08-08 Thread Erick Erickson
Yeah, Shawn, but you, like, know something about Tomcat and
actually provide useful advice ;)

On Mon, Aug 8, 2016 at 6:44 AM, Shawn Heisey  wrote:
> On 8/7/2016 6:53 PM, Tim Chen wrote:
>> Exception in thread "http-bio-8983-exec-6571" java.lang.OutOfMemoryError: 
>> unable to create new native thread
>> at java.lang.Thread.start0(Native Method)
>> at java.lang.Thread.start(Thread.java:714)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1017)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at 
>> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
>> at java.lang.Thread.run(Thread.java:745)
>
> I find myself chasing Erick once again. :)  Supplementing what he told you:
>
> There are two things that might be happening here.
>
> 1) The Tomcat setting "maxThreads" may limiting the number of threads.
> This defaults to 200, and should be increased to 1.  The specific
> error doesn't sound like an application limit, though -- it acts more
> like Java itself can't create the thread.  If you have already adjusted
> maxThreads, then it's more likely to be the second option:
>
> 2) The operating system may be imposing a limit on the number of
> processes/threads a user is allowed to start.  On Linux systems, this is
> typically 1024.  For other operating systems, I am not sure what the
> default limit is.
>
> Thanks,
> Shawn
>


Re: Solr Cloud with 5 servers cluster failed due to Leader out of memory

2016-08-08 Thread Shawn Heisey
On 8/7/2016 6:53 PM, Tim Chen wrote:
> Exception in thread "http-bio-8983-exec-6571" java.lang.OutOfMemoryError: 
> unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:714)
> at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
> at 
> java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1017)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at 
> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
> at java.lang.Thread.run(Thread.java:745)

I find myself chasing Erick once again. :)  Supplementing what he told you:

There are two things that might be happening here.

1) The Tomcat setting "maxThreads" may limiting the number of threads. 
This defaults to 200, and should be increased to 1.  The specific
error doesn't sound like an application limit, though -- it acts more
like Java itself can't create the thread.  If you have already adjusted
maxThreads, then it's more likely to be the second option:

2) The operating system may be imposing a limit on the number of
processes/threads a user is allowed to start.  On Linux systems, this is
typically 1024.  For other operating systems, I am not sure what the
default limit is.

Thanks,
Shawn



Re: Solr Cloud with 5 servers cluster failed due to Leader out of memory

2016-08-07 Thread Erick Erickson
You're correct on maxWarmingSarchers, and autowarming isn't really a concern
since you're not exceeding maxWarmingSearchers.

Wait... the error is a _tomcat_ error according to the
stack trace. If this were the internals of Solr you'd be seeing
org.apache.solr in there somewhere. I've seen the
"unable o create native thread" problem when there were a
bazillion replicas on a single Solr JVM, but it doesn't sound like
this is your case.

You say you have a heavy indexing load... Are you by any chance
sending a single document per update? I usually start with sending
docs (assuming SolrJ) with the CloudSolrClient.add(docList) version
and my docList straw-man number is 1,000. If the docs are
very large, that can be too many, but "very large" here is 50K or so.

In the case I saw, bumping the JVM won't help, the -Xss _might_ help
but the pathological case I was testing didn't get solved that way.

You're right, there are (somehow) just too many threads being created,
but I've seen very heavy indexing rates without this problem, so I'd
guess there's something magic here.

Of the three choices, (a) is probably best assuming you're already
batching up docs for indexing.

Best,
Erick

On Sun, Aug 7, 2016 at 6:53 PM, Tim Chen  wrote:
> Sorry Erick, forgot to answer your question:
>
> No, I didn't increase the maxWarmingSearchers. It is set to 
> 2. I read it somewhere that 
> increasing this is a risk.
>
> Just to make sure, you didn't mean the "autowarmCount " in the 
> 
> Thanks,
> Tim
>
> Reference:
>
>   size="512"
>  initialSize="512"
>  autowarmCount="0"/>
>
> 
>   size="512"
>  initialSize="512"
>  autowarmCount="32"/>
>
> 
> size="512"
>initialSize="512"
>autowarmCount="0"/>
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Saturday, 6 August 2016 2:31 AM
> To: solr-user
> Subject: Re: Solr Cloud with 5 servers cluster failed due to Leader out of 
> memory
>
> You don't really have to worry that much about memory consumed during 
> indexing.
> The ramBufferSizeMB setting in solrconfig.xml pretty much limits the amount 
> of RAM consumed, when adding a doc if that limit is exceeded then the buffer 
> is flushed.
>
> So you can reduce that number, but it's default is 100M and if you're running 
> that close to your limits I suspect you'd get, at best, a bit more runway 
> before you hit the problem again.
>
> NOTE: that number isn't an absolute limit, IIUC the algorithm is
>> index a doc to the in-memory structures check if the limit is exceeded
>> and flush if so.
>
> So say you were at 99% of your ramBufferSizeMB setting and then indexed a 
> ginormous doc your in-memory stuff might be significantly bigger.
>
> Searching usually is the bigger RAM consumer, so when I say "a bit more 
> runway" what I'm thinking about is that when you start _searching_ the data 
> your memory requirements will continue to grow and you'll be back where you 
> started.
>
> And just as a sanity check: You didn't perchance increase the 
> maxWarmingSearchers parameter in solrconfig.xml, did you? If so, that's 
> really a red flag.
>
> Best,
> Erick
>
> On Fri, Aug 5, 2016 at 12:41 AM, Tim Chen  wrote:
>> Thanks Guys. Very very helpful.
>>
>> I will probably look at consolidate 4 Solr servers into 2 bigger/better 
>> server - it gives more memory, and it cut down the replica the Leader needs 
>> to manage.
>>
>> Also, I may look into write a script to monitor the tomcat log and if there 
>> is OOM, kill tomcat, then restart it. A bit dirty, but may work for a short 
>> term.
>>
>> I don't know too much about how documents indexed, and how to save memory 
>> from that. Will probably work with a developer on this as well.
>>
>> Many Thanks guys.
>>
>> Cheers,
>> Tim
>>
>> -Original Message-
>> From: Shawn Heisey [mailto:apa...@elyograg.org]
>> Sent: Friday, 5 August 2016 4:55 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr Cloud with 5 servers cluster failed due to Leader
>> out of memory
>>
>> On 8/4/2016 8:14 PM, Tim Chen wrote:
>>> Couple of thoughts: 1, If Leader goes down, it should just go down,
>>> like dead down, so other servers can do the election and choose the
>>> new leader. This at least avoids bringing down the whole cluster. Am
>>> I right?
>>
>> Supplementing what Erick told you:
>>
>> When a typical Java program throws OutOfMemoryError, program behavior is 
>> completely unpredictable.  There are programming techniques that can be used 
>> so that behavior IS predictable, but writing that code can be challenging.
>>
>> Solr 5.x and 6.x, when they are started on a UNIX/Linux system, use a Java 
>> option to execute a script when OutOfMemoryError happens.  This script kills 
>> Solr 

RE: Solr Cloud with 5 servers cluster failed due to Leader out of memory

2016-08-07 Thread Tim Chen
Sorry Erick, forgot to answer your question:

No, I didn't increase the maxWarmingSearchers. It is set to 
2. I read it somewhere that 
increasing this is a risk.

Just to make sure, you didn't mean the "autowarmCount " in the 
 index a doc to the in-memory structures check if the limit is exceeded
> and flush if so.

So say you were at 99% of your ramBufferSizeMB setting and then indexed a 
ginormous doc your in-memory stuff might be significantly bigger.

Searching usually is the bigger RAM consumer, so when I say "a bit more runway" 
what I'm thinking about is that when you start _searching_ the data your memory 
requirements will continue to grow and you'll be back where you started.

And just as a sanity check: You didn't perchance increase the 
maxWarmingSearchers parameter in solrconfig.xml, did you? If so, that's really 
a red flag.

Best,
Erick

On Fri, Aug 5, 2016 at 12:41 AM, Tim Chen  wrote:
> Thanks Guys. Very very helpful.
>
> I will probably look at consolidate 4 Solr servers into 2 bigger/better 
> server - it gives more memory, and it cut down the replica the Leader needs 
> to manage.
>
> Also, I may look into write a script to monitor the tomcat log and if there 
> is OOM, kill tomcat, then restart it. A bit dirty, but may work for a short 
> term.
>
> I don't know too much about how documents indexed, and how to save memory 
> from that. Will probably work with a developer on this as well.
>
> Many Thanks guys.
>
> Cheers,
> Tim
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: Friday, 5 August 2016 4:55 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Cloud with 5 servers cluster failed due to Leader
> out of memory
>
> On 8/4/2016 8:14 PM, Tim Chen wrote:
>> Couple of thoughts: 1, If Leader goes down, it should just go down,
>> like dead down, so other servers can do the election and choose the
>> new leader. This at least avoids bringing down the whole cluster. Am
>> I right?
>
> Supplementing what Erick told you:
>
> When a typical Java program throws OutOfMemoryError, program behavior is 
> completely unpredictable.  There are programming techniques that can be used 
> so that behavior IS predictable, but writing that code can be challenging.
>
> Solr 5.x and 6.x, when they are started on a UNIX/Linux system, use a Java 
> option to execute a script when OutOfMemoryError happens.  This script kills 
> Solr completely.  We are working on adding this capability when running on 
> Windows.
>
>> 2, Apparently we should not pushing too many documents to Solr, how
>> do you guys handle this? Set a limit somewhere?
>
> There are exactly two ways to deal with OOME problems: Increase the heap or 
> reduce Solr's memory requirements.  The number of documents you push to Solr 
> is unlikely to have a large effect on the amount of memory that Solr 
> requires.  Here's some information on this topic:
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
>
> Thanks,
> Shawn
>
>
>
> [Premiere League Starts Saturday 13 August 9.30pm on
> SBS]


[Premiere League Starts Saturday 13 August 9.30pm on 
SBS]


RE: Solr Cloud with 5 servers cluster failed due to Leader out of memory

2016-08-07 Thread Tim Chen
Hi Erick, Shawn,

Thanks for following this up.

1,
For some reason, ramBufferSizeMB in our solrconfig.xml is not set to 100MB, but 
32MB.

In that case, considering we have 10G for JVM, my understanding is we should 
not run out of memory due to large number of documents being added to Solr.

Just to make sure I understand it correctly, the documents adding to Solr will 
be stored in an internal queue in Solr, and Solr will only use that 32MB (or 
99% of 32M + one extra document memory) for indexing documents. The documents 
in the queue will be indexed one by one.

2,
Based on our tomcat (Solr) access_log and website peak hours, the time we had 
our cluster failure is not likely because of _searching_traffic. Eg, we can see 
much more Solr requests with 'update' keyword, but as usual number of requests 
with 'select' keyword.

3,
Now, this leads me to the only reason I can think of: (you mentioned this 
earlier as well):
Since each Shard has 4 replicas in our setup, when there are large number of 
documents being add, the Leader will create a lot of threads to send the 
document to other replica servers. All these threads are the one consumed all 
the memory on Leader server, and leads to OOM.

If my assumption was right, to try or fix this issue, is to:
a): still need to limit the documents being add to Solr
b): change to 2 replica for each shard (loss of data reliability, but..)
c): bump up server memory.

Am I going the right way? Any advice and suggestions are much appreciated!!

Also attached part of catalina.out OOM log for reference:

Exception in thread "http-bio-8983-exec-6571" java.lang.OutOfMemoryError: 
unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
at 
java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1017)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at 
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "http-bio-8983-exec-6861" java.lang.OutOfMemoryError: 
unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
at 
java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1017)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at 
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "http-bio-8983-exec-6671" java.lang.OutOfMemoryError: 
unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
at 
java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1017)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at 
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)

Many thanks,
Tim


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Saturday, 6 August 2016 2:31 AM
To: solr-user
Subject: Re: Solr Cloud with 5 servers cluster failed due to Leader out of 
memory

You don't really have to worry that much about memory consumed during indexing.
The ramBufferSizeMB setting in solrconfig.xml pretty much limits the amount of 
RAM consumed, when adding a doc if that limit is exceeded then the buffer is 
flushed.

So you can reduce that number, but it's default is 100M and if you're running 
that close to your limits I suspect you'd get, at best, a bit more runway 
before you hit the problem again.

NOTE: that number isn't an absolute limit, IIUC the algorithm is
> index a doc to the in-memory structures check if the limit is exceeded
> and flush if so.

So say you were at 99% of your ramBufferSizeMB setting and then indexed a 
ginormous doc your in-memory stuff might be significantly bigger.

Searching usually is the bigger RAM consumer, so when I say "a bit more runway" 
what I'm thinking about is that when you start _searching_ the data your memory 
requirements will continue to grow and you'll be back where you started.

And just as a sanity check: 

Re: Solr Cloud with 5 servers cluster failed due to Leader out of memory

2016-08-05 Thread Erick Erickson
You don't really have to worry that much about memory consumed during indexing.
The ramBufferSizeMB setting in solrconfig.xml pretty much limits the amount of
RAM consumed, when adding a doc if that limit is exceeded then the
buffer is flushed.

So you can reduce that number, but it's default is 100M and if you're
running that close
to your limits I suspect you'd get, at best, a bit more runway before
you hit the problem again.

NOTE: that number isn't an absolute limit, IIUC the algorithm is
> index a doc to the in-memory structures
> check if the limit is exceeded and flush if so.

So say you were at 99% of your ramBufferSizeMB setting and then indexed a
ginormous doc your in-memory stuff might be significantly bigger.

Searching usually is the bigger RAM consumer, so when I say "a bit
more runway" what
I'm thinking about is that when you start _searching_ the data your
memory requirements
will continue to grow and you'll be back where you started.

And just as a sanity check: You didn't perchance increase the
maxWarmingSearchers
parameter in solrconfig.xml, did you? If so, that's really a red flag.

Best,
Erick

On Fri, Aug 5, 2016 at 12:41 AM, Tim Chen  wrote:
> Thanks Guys. Very very helpful.
>
> I will probably look at consolidate 4 Solr servers into 2 bigger/better 
> server - it gives more memory, and it cut down the replica the Leader needs 
> to manage.
>
> Also, I may look into write a script to monitor the tomcat log and if there 
> is OOM, kill tomcat, then restart it. A bit dirty, but may work for a short 
> term.
>
> I don't know too much about how documents indexed, and how to save memory 
> from that. Will probably work with a developer on this as well.
>
> Many Thanks guys.
>
> Cheers,
> Tim
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: Friday, 5 August 2016 4:55 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Cloud with 5 servers cluster failed due to Leader out of 
> memory
>
> On 8/4/2016 8:14 PM, Tim Chen wrote:
>> Couple of thoughts: 1, If Leader goes down, it should just go down,
>> like dead down, so other servers can do the election and choose the
>> new leader. This at least avoids bringing down the whole cluster. Am I
>> right?
>
> Supplementing what Erick told you:
>
> When a typical Java program throws OutOfMemoryError, program behavior is 
> completely unpredictable.  There are programming techniques that can be used 
> so that behavior IS predictable, but writing that code can be challenging.
>
> Solr 5.x and 6.x, when they are started on a UNIX/Linux system, use a Java 
> option to execute a script when OutOfMemoryError happens.  This script kills 
> Solr completely.  We are working on adding this capability when running on 
> Windows.
>
>> 2, Apparently we should not pushing too many documents to Solr, how do
>> you guys handle this? Set a limit somewhere?
>
> There are exactly two ways to deal with OOME problems: Increase the heap or 
> reduce Solr's memory requirements.  The number of documents you push to Solr 
> is unlikely to have a large effect on the amount of memory that Solr 
> requires.  Here's some information on this topic:
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
>
> Thanks,
> Shawn
>
>
>
> [Premiere League Starts Saturday 13 August 9.30pm on 
> SBS]


RE: Solr Cloud with 5 servers cluster failed due to Leader out of memory

2016-08-05 Thread Tim Chen
Thanks Guys. Very very helpful.

I will probably look at consolidate 4 Solr servers into 2 bigger/better server 
- it gives more memory, and it cut down the replica the Leader needs to manage.

Also, I may look into write a script to monitor the tomcat log and if there is 
OOM, kill tomcat, then restart it. A bit dirty, but may work for a short term.

I don't know too much about how documents indexed, and how to save memory from 
that. Will probably work with a developer on this as well.

Many Thanks guys.

Cheers,
Tim

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Friday, 5 August 2016 4:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Cloud with 5 servers cluster failed due to Leader out of 
memory

On 8/4/2016 8:14 PM, Tim Chen wrote:
> Couple of thoughts: 1, If Leader goes down, it should just go down,
> like dead down, so other servers can do the election and choose the
> new leader. This at least avoids bringing down the whole cluster. Am I
> right?

Supplementing what Erick told you:

When a typical Java program throws OutOfMemoryError, program behavior is 
completely unpredictable.  There are programming techniques that can be used so 
that behavior IS predictable, but writing that code can be challenging.

Solr 5.x and 6.x, when they are started on a UNIX/Linux system, use a Java 
option to execute a script when OutOfMemoryError happens.  This script kills 
Solr completely.  We are working on adding this capability when running on 
Windows.

> 2, Apparently we should not pushing too many documents to Solr, how do
> you guys handle this? Set a limit somewhere?

There are exactly two ways to deal with OOME problems: Increase the heap or 
reduce Solr's memory requirements.  The number of documents you push to Solr is 
unlikely to have a large effect on the amount of memory that Solr requires.  
Here's some information on this topic:

https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

Thanks,
Shawn



[Premiere League Starts Saturday 13 August 9.30pm on 
SBS]


Re: Solr Cloud with 5 servers cluster failed due to Leader out of memory

2016-08-05 Thread Shawn Heisey
On 8/4/2016 8:14 PM, Tim Chen wrote:
> Couple of thoughts: 1, If Leader goes down, it should just go down,
> like dead down, so other servers can do the election and choose the
> new leader. This at least avoids bringing down the whole cluster. Am I
> right? 

Supplementing what Erick told you:

When a typical Java program throws OutOfMemoryError, program behavior is
completely unpredictable.  There are programming techniques that can be
used so that behavior IS predictable, but writing that code can be
challenging.

Solr 5.x and 6.x, when they are started on a UNIX/Linux system, use a
Java option to execute a script when OutOfMemoryError happens.  This
script kills Solr completely.  We are working on adding this capability
when running on Windows.

> 2, Apparently we should not pushing too many documents to Solr, how do
> you guys handle this? Set a limit somewhere? 

There are exactly two ways to deal with OOME problems: Increase the heap
or reduce Solr's memory requirements.  The number of documents you push
to Solr is unlikely to have a large effect on the amount of memory that
Solr requires.  Here's some information on this topic:

https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

Thanks,
Shawn



Re: Solr Cloud with 5 servers cluster failed due to Leader out of memory

2016-08-04 Thread Erick Erickson
The fact that all the shards have the same leader is somewhat of a red
herring. Until you get hundreds of shards (perhaps across a _lot_ of
collections), the additional load on the leaders is hard to measure.
If you really see this as a problem, consider the BALANCESHARDUNIQUE
and REBALANCELEADERS Collection API commands, see:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-BalanceSliceUnique

That said, your OOM errors indicate you simply have too many Solr
collections doing too many things with too little memory.

bq: All other non-leader server are relying on Leader to finish the
new document index.

This is not the case. The indexing is done independently on all
replicas. What's _probably_ happening here is that the leaders are
spinning off threads to pass the data on to the replicas and you're
running so close to the heap limit that spinning up those threads is
pushing you to OOM errors.

And, if my hypothesis is true you'll soon run into problems on the
non-leaders as you index more and more documents to your collections.
Consider some serious effort in terms of determining your hardware/JVM
needs, see: 
https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Best,
Erick