Re: Solr Cloud with 5 servers cluster failed due to Leader out of memory

2016-08-07 Thread Erick Erickson
You're correct on maxWarmingSarchers, and autowarming isn't really a concern
since you're not exceeding maxWarmingSearchers.

Wait... the error is a _tomcat_ error according to the
stack trace. If this were the internals of Solr you'd be seeing
org.apache.solr in there somewhere. I've seen the
"unable o create native thread" problem when there were a
bazillion replicas on a single Solr JVM, but it doesn't sound like
this is your case.

You say you have a heavy indexing load... Are you by any chance
sending a single document per update? I usually start with sending
docs (assuming SolrJ) with the CloudSolrClient.add(docList) version
and my docList straw-man number is 1,000. If the docs are
very large, that can be too many, but "very large" here is 50K or so.

In the case I saw, bumping the JVM won't help, the -Xss _might_ help
but the pathological case I was testing didn't get solved that way.

You're right, there are (somehow) just too many threads being created,
but I've seen very heavy indexing rates without this problem, so I'd
guess there's something magic here.

Of the three choices, (a) is probably best assuming you're already
batching up docs for indexing.

Best,
Erick

On Sun, Aug 7, 2016 at 6:53 PM, Tim Chen  wrote:
> Sorry Erick, forgot to answer your question:
>
> No, I didn't increase the maxWarmingSearchers. It is set to 
> 2. I read it somewhere that 
> increasing this is a risk.
>
> Just to make sure, you didn't mean the "autowarmCount " in the 
> 
> Thanks,
> Tim
>
> Reference:
>
>   size="512"
>  initialSize="512"
>  autowarmCount="0"/>
>
> 
>   size="512"
>  initialSize="512"
>  autowarmCount="32"/>
>
> 
> size="512"
>initialSize="512"
>autowarmCount="0"/>
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Saturday, 6 August 2016 2:31 AM
> To: solr-user
> Subject: Re: Solr Cloud with 5 servers cluster failed due to Leader out of 
> memory
>
> You don't really have to worry that much about memory consumed during 
> indexing.
> The ramBufferSizeMB setting in solrconfig.xml pretty much limits the amount 
> of RAM consumed, when adding a doc if that limit is exceeded then the buffer 
> is flushed.
>
> So you can reduce that number, but it's default is 100M and if you're running 
> that close to your limits I suspect you'd get, at best, a bit more runway 
> before you hit the problem again.
>
> NOTE: that number isn't an absolute limit, IIUC the algorithm is
>> index a doc to the in-memory structures check if the limit is exceeded
>> and flush if so.
>
> So say you were at 99% of your ramBufferSizeMB setting and then indexed a 
> ginormous doc your in-memory stuff might be significantly bigger.
>
> Searching usually is the bigger RAM consumer, so when I say "a bit more 
> runway" what I'm thinking about is that when you start _searching_ the data 
> your memory requirements will continue to grow and you'll be back where you 
> started.
>
> And just as a sanity check: You didn't perchance increase the 
> maxWarmingSearchers parameter in solrconfig.xml, did you? If so, that's 
> really a red flag.
>
> Best,
> Erick
>
> On Fri, Aug 5, 2016 at 12:41 AM, Tim Chen  wrote:
>> Thanks Guys. Very very helpful.
>>
>> I will probably look at consolidate 4 Solr servers into 2 bigger/better 
>> server - it gives more memory, and it cut down the replica the Leader needs 
>> to manage.
>>
>> Also, I may look into write a script to monitor the tomcat log and if there 
>> is OOM, kill tomcat, then restart it. A bit dirty, but may work for a short 
>> term.
>>
>> I don't know too much about how documents indexed, and how to save memory 
>> from that. Will probably work with a developer on this as well.
>>
>> Many Thanks guys.
>>
>> Cheers,
>> Tim
>>
>> -Original Message-
>> From: Shawn Heisey [mailto:apa...@elyograg.org]
>> Sent: Friday, 5 August 2016 4:55 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr Cloud with 5 servers cluster failed due to Leader
>> out of memory
>>
>> On 8/4/2016 8:14 PM, Tim Chen wrote:
>>> Couple of thoughts: 1, If Leader goes down, it should just go down,
>>> like dead down, so other servers can do the election and choose the
>>> new leader. This at least avoids bringing down the whole cluster. Am
>>> I right?
>>
>> Supplementing what Erick told you:
>>
>> When a typical Java program throws OutOfMemoryError, program behavior is 
>> completely unpredictable.  There are programming techniques that can be used 
>> so that behavior IS predictable, but writing that code can be challenging.
>>
>> Solr 5.x and 6.x, when they are started on a UNIX/Linux system, use a Java 
>> option to execute a script when OutOfMemoryError happens.  This script kills 
>> Solr completely.  We are working on adding this capability when running on 
>> Windows.
>>
>>>

RE: Solr Cloud with 5 servers cluster failed due to Leader out of memory

2016-08-07 Thread Tim Chen
Sorry Erick, forgot to answer your question:

No, I didn't increase the maxWarmingSearchers. It is set to 
2. I read it somewhere that 
increasing this is a risk.

Just to make sure, you didn't mean the "autowarmCount " in the 








-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Saturday, 6 August 2016 2:31 AM
To: solr-user
Subject: Re: Solr Cloud with 5 servers cluster failed due to Leader out of 
memory

You don't really have to worry that much about memory consumed during indexing.
The ramBufferSizeMB setting in solrconfig.xml pretty much limits the amount of 
RAM consumed, when adding a doc if that limit is exceeded then the buffer is 
flushed.

So you can reduce that number, but it's default is 100M and if you're running 
that close to your limits I suspect you'd get, at best, a bit more runway 
before you hit the problem again.

NOTE: that number isn't an absolute limit, IIUC the algorithm is
> index a doc to the in-memory structures check if the limit is exceeded
> and flush if so.

So say you were at 99% of your ramBufferSizeMB setting and then indexed a 
ginormous doc your in-memory stuff might be significantly bigger.

Searching usually is the bigger RAM consumer, so when I say "a bit more runway" 
what I'm thinking about is that when you start _searching_ the data your memory 
requirements will continue to grow and you'll be back where you started.

And just as a sanity check: You didn't perchance increase the 
maxWarmingSearchers parameter in solrconfig.xml, did you? If so, that's really 
a red flag.

Best,
Erick

On Fri, Aug 5, 2016 at 12:41 AM, Tim Chen  wrote:
> Thanks Guys. Very very helpful.
>
> I will probably look at consolidate 4 Solr servers into 2 bigger/better 
> server - it gives more memory, and it cut down the replica the Leader needs 
> to manage.
>
> Also, I may look into write a script to monitor the tomcat log and if there 
> is OOM, kill tomcat, then restart it. A bit dirty, but may work for a short 
> term.
>
> I don't know too much about how documents indexed, and how to save memory 
> from that. Will probably work with a developer on this as well.
>
> Many Thanks guys.
>
> Cheers,
> Tim
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: Friday, 5 August 2016 4:55 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Cloud with 5 servers cluster failed due to Leader
> out of memory
>
> On 8/4/2016 8:14 PM, Tim Chen wrote:
>> Couple of thoughts: 1, If Leader goes down, it should just go down,
>> like dead down, so other servers can do the election and choose the
>> new leader. This at least avoids bringing down the whole cluster. Am
>> I right?
>
> Supplementing what Erick told you:
>
> When a typical Java program throws OutOfMemoryError, program behavior is 
> completely unpredictable.  There are programming techniques that can be used 
> so that behavior IS predictable, but writing that code can be challenging.
>
> Solr 5.x and 6.x, when they are started on a UNIX/Linux system, use a Java 
> option to execute a script when OutOfMemoryError happens.  This script kills 
> Solr completely.  We are working on adding this capability when running on 
> Windows.
>
>> 2, Apparently we should not pushing too many documents to Solr, how
>> do you guys handle this? Set a limit somewhere?
>
> There are exactly two ways to deal with OOME problems: Increase the heap or 
> reduce Solr's memory requirements.  The number of documents you push to Solr 
> is unlikely to have a large effect on the amount of memory that Solr 
> requires.  Here's some information on this topic:
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
>
> Thanks,
> Shawn
>
>
>
> [Premiere League Starts Saturday 13 August 9.30pm on
> SBS]


[Premiere League Starts Saturday 13 August 9.30pm on 
SBS]


RE: Solr Cloud with 5 servers cluster failed due to Leader out of memory

2016-08-07 Thread Tim Chen
Hi Erick, Shawn,

Thanks for following this up.

1,
For some reason, ramBufferSizeMB in our solrconfig.xml is not set to 100MB, but 
32MB.

In that case, considering we have 10G for JVM, my understanding is we should 
not run out of memory due to large number of documents being added to Solr.

Just to make sure I understand it correctly, the documents adding to Solr will 
be stored in an internal queue in Solr, and Solr will only use that 32MB (or 
99% of 32M + one extra document memory) for indexing documents. The documents 
in the queue will be indexed one by one.

2,
Based on our tomcat (Solr) access_log and website peak hours, the time we had 
our cluster failure is not likely because of _searching_traffic. Eg, we can see 
much more Solr requests with 'update' keyword, but as usual number of requests 
with 'select' keyword.

3,
Now, this leads me to the only reason I can think of: (you mentioned this 
earlier as well):
Since each Shard has 4 replicas in our setup, when there are large number of 
documents being add, the Leader will create a lot of threads to send the 
document to other replica servers. All these threads are the one consumed all 
the memory on Leader server, and leads to OOM.

If my assumption was right, to try or fix this issue, is to:
a): still need to limit the documents being add to Solr
b): change to 2 replica for each shard (loss of data reliability, but..)
c): bump up server memory.

Am I going the right way? Any advice and suggestions are much appreciated!!

Also attached part of catalina.out OOM log for reference:

Exception in thread "http-bio-8983-exec-6571" java.lang.OutOfMemoryError: 
unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
at 
java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1017)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at 
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "http-bio-8983-exec-6861" java.lang.OutOfMemoryError: 
unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
at 
java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1017)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at 
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "http-bio-8983-exec-6671" java.lang.OutOfMemoryError: 
unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
at 
java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1017)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at 
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)

Many thanks,
Tim


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Saturday, 6 August 2016 2:31 AM
To: solr-user
Subject: Re: Solr Cloud with 5 servers cluster failed due to Leader out of 
memory

You don't really have to worry that much about memory consumed during indexing.
The ramBufferSizeMB setting in solrconfig.xml pretty much limits the amount of 
RAM consumed, when adding a doc if that limit is exceeded then the buffer is 
flushed.

So you can reduce that number, but it's default is 100M and if you're running 
that close to your limits I suspect you'd get, at best, a bit more runway 
before you hit the problem again.

NOTE: that number isn't an absolute limit, IIUC the algorithm is
> index a doc to the in-memory structures check if the limit is exceeded
> and flush if so.

So say you were at 99% of your ramBufferSizeMB setting and then indexed a 
ginormous doc your in-memory stuff might be significantly bigger.

Searching usually is the bigger RAM consumer, so when I say "a bit more runway" 
what I'm thinking about is that when you start _searching_ the data your memory 
requirements will continue to grow and you'll be back where you started.

And just as a sanity check: Y

Unique key field type in solr 6.1 schema

2016-08-07 Thread Bharath Kumar
Hi All,

I have an issue with cross data center replication, when we delete the
document by id from the main site. The target site document is not deleted.
I have the id field which is a unique field for my schema which is
configured as "long".

If i change the type to "string" it works fine. Is there any issue using
long. Because we migrated from 4.4 to 6.1, and we had the id field as long.
Can you please help me with this. Really appreciate your help.

*I see the below error on the target site:-*

 o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Invalid
Number:
  at org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:537)
at org.apache.solr.update.DeleteUpdateCommand.getIndexedId(
DeleteUpdateCommand.java:65)
at org.apache.solr.update.processor.DistributedUpdateProcessor.
versionDelete(DistributedUpdateProcessor.java:1495)
at org.apache.solr.update.processor.CdcrUpdateProcessor.
versionDelete(CdcrUpdateProcessor.java:85)

-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


Re: Issue faced while re-starting solr 6.1.0 after cleaning zk data.

2016-08-07 Thread Shalin Shekhar Mangar
Hi Naveen,

When you cleaned up the zk data, it also removed the configuration files
that are stored in Zookeeper. These configuration files are used by Solr
and therefore they can no longer be found. You need to upload those
configuration files again to ZK using:

bin/solr zk upconfig -d /path/to/your/conf/directory -n sample -z
zkhost_connect_string

On Sun, Aug 7, 2016 at 3:14 PM, Naveen Pajjuri 
wrote:

> Here sample is the name of my collection.
>
> Thanks
>
> On Sun, Aug 7, 2016 at 3:10 PM, Naveen Pajjuri 
> wrote:
>
> > Hi,
> > I'm trying to move to solr-6.1.0. it was working fine and i cleaned up zk
> > data (version folder) and restarted solr and zookeeper. I started getting
> > this error.
> >
> >
> >- *sample_shard1_replica1:* org.apache.solr.common.cloud.
> >ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
> >Specified config does not exist in ZooKeeper: sample.
> >
> >
> > Please let me know what i'm missing.
> >
> > Regards,
> > Naveen Reddy.
> >
>



-- 
Regards,
Shalin Shekhar Mangar.


RE: Issue faced while re-starting solr 6.1.0 after cleaning zk data.

2016-08-07 Thread Ritesh Kumar (Avanade)
Hi Naveen,



I had the same issue, I did the below steps to fix it.



1.   Stop SOLR service across all Solr VM’s.

2.   Stop ZooKeeper service in all ZK VM’s.

3.   Rename all the log files again eg: _log.1

[cid:image001.png@01D1F0C5.D8D498D0]

4.   Start ZooKeeper service in all ZK VM’s.

5.   Start SOLR service across all Solr VM’s.

6.   Open the SOLR dashboard across all the 3 Solr VM’s and they should not 
show any error.

7.   If the error persists, restart all the boxes.



Ritesh K

Infrastructure Sr. Engineer – Jericho Team

Sales & Marketing Digital Services

t +91-7799936921   v-kur...@microsoft.com



-Original Message-
From: Naveen Pajjuri [mailto:pajjuri.re...@myntra.com]
Sent: 07 August 2016 15:14
To: solr-user@lucene.apache.org
Subject: Re: Issue faced while re-starting solr 6.1.0 after cleaning zk data.



Here sample is the name of my collection.



Thanks



On Sun, Aug 7, 2016 at 3:10 PM, Naveen Pajjuri 
mailto:pajjuri.re...@myntra.com>>

wrote:



> Hi,

> I'm trying to move to solr-6.1.0. it was working fine and i cleaned up

> zk data (version folder) and restarted solr and zookeeper. I started

> getting this error.

>

>

>- *sample_shard1_replica1:* org.apache.solr.common.cloud.

>ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:

>Specified config does not exist in ZooKeeper: sample.

>

>

> Please let me know what i'm missing.

>

> Regards,

> Naveen Reddy.

>


Re: Issue faced while re-starting solr 6.1.0 after cleaning zk data.

2016-08-07 Thread Naveen Pajjuri
Here sample is the name of my collection.

Thanks

On Sun, Aug 7, 2016 at 3:10 PM, Naveen Pajjuri 
wrote:

> Hi,
> I'm trying to move to solr-6.1.0. it was working fine and i cleaned up zk
> data (version folder) and restarted solr and zookeeper. I started getting
> this error.
>
>
>- *sample_shard1_replica1:* org.apache.solr.common.cloud.
>ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
>Specified config does not exist in ZooKeeper: sample.
>
>
> Please let me know what i'm missing.
>
> Regards,
> Naveen Reddy.
>


Issue faced while re-starting solr 6.1.0 after cleaning zk data.

2016-08-07 Thread Naveen Pajjuri
Hi,
I'm trying to move to solr-6.1.0. it was working fine and i cleaned up zk
data (version folder) and restarted solr and zookeeper. I started getting
this error.


   - *sample_shard1_replica1:*
org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
   Specified config does not exist in ZooKeeper: sample.


Please let me know what i'm missing.

Regards,
Naveen Reddy.