Re: Why is SolrCloud doing a full copy of the index?

2013-05-06 Thread Michael Della Bitta
Hi Shawn,

Thanks a lot for this entry!

I'm wondering, when you say Garbage collections that happen more often
than ten or so times per minute may be an indication that the heap size is
too small, do you mean *any* collections, or just full collections?


Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Sat, May 4, 2013 at 1:55 PM, Shawn Heisey s...@elyograg.org wrote:

 On 5/4/2013 11:45 AM, Shawn Heisey wrote:
  Advance warning: this is a long reply.

 I have condensed some relevant performance problem information into the
 following wiki page:

 http://wiki.apache.org/solr/SolrPerformanceProblems

 Anyone who has additional information for this page, feel free to add
 it.  I hope I haven't made too many mistakes!

 Thanks,
 Shawn




Re: Why is SolrCloud doing a full copy of the index?

2013-05-06 Thread Shawn Heisey

On 5/6/2013 1:39 PM, Michael Della Bitta wrote:

Hi Shawn,

Thanks a lot for this entry!

I'm wondering, when you say Garbage collections that happen more often
than ten or so times per minute may be an indication that the heap size is
too small, do you mean *any* collections, or just full collections?


My gut reaction is any collection, but in extremely busy environments a 
rate of ten per minute might be a very slow day on a setup that's 
working perfectly.


As I wrote that particular bit, I was thinking that any number I put 
there was probably wrong for some large subset of users, but I wanted to 
finish putting down my thoughts and improve it later.


Thanks,
Shawn



Re: Why is SolrCloud doing a full copy of the index?

2013-05-06 Thread Otis Gospodnetic
Hi,

I just looked at SPM monitoring we have for Solr servers that run
search-lucene.com.  One of them has 1-2 collections/minute.  Another
one closer to 10.  These are both small servers with small JVM heaps.
Here is a graph of one of them:

https://apps.sematext.com/spm/s/104ppwguao

Just looked at some other Java servers we have running, not Solr, and
I see close to 60 small collections per minute.

So these numbers will vary a lot depending on the heap size and other
JVM settings, as well as the actual code/usage. :)

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Mon, May 6, 2013 at 4:39 PM, Shawn Heisey s...@elyograg.org wrote:
 On 5/6/2013 1:39 PM, Michael Della Bitta wrote:

 Hi Shawn,

 Thanks a lot for this entry!

 I'm wondering, when you say Garbage collections that happen more often
 than ten or so times per minute may be an indication that the heap size is
 too small, do you mean *any* collections, or just full collections?


 My gut reaction is any collection, but in extremely busy environments a rate
 of ten per minute might be a very slow day on a setup that's working
 perfectly.

 As I wrote that particular bit, I was thinking that any number I put there
 was probably wrong for some large subset of users, but I wanted to finish
 putting down my thoughts and improve it later.

 Thanks,
 Shawn



Re: Why is SolrCloud doing a full copy of the index?

2013-05-05 Thread Erick Erickson
Second the thanks

Erick

On Sat, May 4, 2013 at 6:08 PM, Lance Norskog goks...@gmail.com wrote:
 Great! Thank you very much Shawn.


 On 05/04/2013 10:55 AM, Shawn Heisey wrote:

 On 5/4/2013 11:45 AM, Shawn Heisey wrote:

 Advance warning: this is a long reply.

 I have condensed some relevant performance problem information into the
 following wiki page:

 http://wiki.apache.org/solr/SolrPerformanceProblems

 Anyone who has additional information for this page, feel free to add
 it.  I hope I haven't made too many mistakes!

 Thanks,
 Shawn




Re: Why is SolrCloud doing a full copy of the index?

2013-05-05 Thread Kumar Limbu
Thanks for the replies. It is really appreciated.

Based on the replies it seems like upgrading to the latest version of Solr
is something that will probably resolve this issue.

We are also update quite frequently. We update every 5 minutes. We will try
and set this to higher interval and see if that helps.

We will also try increasing the servlet timeout and see if that resolves the
issue.

Among the other suggestions we already tried increasing the zkClientTimeout
from 15 seconds to 30 seconds but that didn't seem to help. What do you
recommend is a good value to try?

Few more details about our system:
we are running this on a system with 16GB of RAM. We are using 64 bit server
and we also use SSD disks.

Also, since we are already using 4.0 in our production environment with the
aforementioned 3 servers setup, how should we go about upgrading to the
latest version (4.3)? Do we need to do a full reindex of our data or is the
index compatible between these versions? 

We will try out the suggestions and will post later if any of them help us
resolve the issue.

Again, thanks for the reply.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-is-SolrCloud-doing-a-full-copy-of-the-index-tp4060800p4060897.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Why is SolrCloud doing a full copy of the index?

2013-05-05 Thread Kristopher Kane
 
 Advance warning: this is a long reply.
 

Awesome Shawn.  Thanks!





Re: Why is SolrCloud doing a full copy of the index?

2013-05-04 Thread Erick Erickson
Hmmm, there was a problem with replication where it would do a full
copy unnecessarily that was fixed in 4.2 (I think). Frankly I don't
quite know whether it was a problem with 4.0.

It's also possible that your servlet containers have a short enough
timeout that you're occasionally just getting connection timeouts, so
lengthening that interval might be worthwhile, but that's a stab in
the dark.

Best
Erick

On Sat, May 4, 2013 at 4:06 AM, Kumar Limbu kumarli...@gmail.com wrote:
 We have Solr setup on 3 machines with only a single shard. We are using Solr
 4.0 and currently have around 7 Million documents in our index. The size of
 our index is around 25 GB. We have a zookeeper ensemble of 3 zookeeper
 instances.

 Let's call the servers in our setup server (A), (B) and (C). All updates to
 Solr goes via server (C). Searches are performed on server (A) and (B). The
 updates are normally propagated incrementally from server (C) to the other 2
 servers.  Intermittently we have noted that the servers (A) and (B) makes a
 full copy of the index from server (C). This is not ideal because when this
 happens performance suffers. This occurs quite randomly and can occur on any
 of the other 2 nodes i.e. (A) and (B).

 On the server (C), which is the leader, we see errors like the following .We
 suspect this might be the reason why a full index copy occurs in the other
 nodes but we haven't been able to find out why this error is occurring.
 There is no connectivity issue with the servers.

 See the stacktrace below:

 SEVERE: shard update error StdNode:
 http://serverA/solr/rn0/:org.apache.solr.client.solrj.SolrServerException:
 IOException occured when talking to server at: http:// serverA/solr/rn0
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413)
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
 at
 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:335)
 at
 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:1)
 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: org.apache.http.NoHttpResponseException: The target server failed
 to respond
 at
 org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:101)
 at
 org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:252)
 at
 org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:282)
 at
 org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:247)
 at
 org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:216)
 at
 org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:298)
 at
 org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
 at
 org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:647)
 at
 org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464)
 at
 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
 at
 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
 at
 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
 ... 11 more

 If anyone can help us troubleshoot this problem we will really appreciate
 the help. If there are any questions regarding our setup or further
 information regarding the error, please let me know.




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Why-is-SolrCloud-doing-a-full-copy-of-the-index-tp4060800.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Why is SolrCloud doing a full copy of the index?

2013-05-04 Thread Shawn Heisey
On 5/4/2013 2:06 AM, Kumar Limbu wrote:
 We have Solr setup on 3 machines with only a single shard. We are using Solr
 4.0 and currently have around 7 Million documents in our index. The size of
 our index is around 25 GB. We have a zookeeper ensemble of 3 zookeeper
 instances.
  
 Let's call the servers in our setup server (A), (B) and (C). All updates to
 Solr goes via server (C). Searches are performed on server (A) and (B). The
 updates are normally propagated incrementally from server (C) to the other 2
 servers.  Intermittently we have noted that the servers (A) and (B) makes a
 full copy of the index from server (C). This is not ideal because when this
 happens performance suffers. This occurs quite randomly and can occur on any
 of the other 2 nodes i.e. (A) and (B). 
  
 On the server (C), which is the leader, we see errors like the following .We
 suspect this might be the reason why a full index copy occurs in the other
 nodes but we haven't been able to find out why this error is occurring.
 There is no connectivity issue with the servers.

Advance warning: this is a long reply.

The first thing that jumped out at me was the Solr version.  Version 4.0
was brand new in October of last year.  It's a senior citizen now.  It
has a lot of bugs, particularly in SolrCloud stability.  I would
recommend upgrading to at least 4.2.1.

Version 4.3.0 (the fourth since 4.0) is quite literally about to be
unveiled.  It is already on a lot of download mirrors, the announcement
is due any time now.

Now for things to consider that don't involve upgrading, but might still
be issues after upgrading:

You might be able to make your system more stable by increasing your
zkClientTimeout.  A typical example value for this setting is 15
seconds. Next we will discuss why you might be exceeding the timeout:

Slow operations, especially on commits, can be responsible for exceeding
timeouts.  One of the things you can do to decrease commit time is to
lower the autowarmCount on your Solr caches.  You can also decrease the
frequency of your commits.

A 25GB index is relatively large, and requires a lot of memory for
proper operation.  The reason it requires a lot of memory is because
Solr is very reliant on the operating system disk cache, which uses free
memory.

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

With a 25GB index, you want to have between 15 and 25GB of memory over
and above the memory that your programs use.  You would probably want to
give the Java heap for Solr between 4 and 8GB.  For a dedicated Solr
server with your index, a really good amount of total system memory
would be 32GB, with 24GB being a reasonable starting point.

It should go without saying that you need a 64 bit server, a 64 bit
operating system, and 64 bit Java for all this to work correctly.  32
bit software is not good at dealing with large amounts of memory, and 32
bit Java cannot have a heap size larger than 2GB.

If you upgrade to 4.2.1 or later and reindex, your index size will drop
due to compression of certain pieces.  Those pieces don't normally
affect minimum memory requirements very much, so your free memory
requirement will still probably be at least 15GB.

Unless you are using a commercial JVM with low-pause characteristics
(like Zing), a heap of 4GB or larger can give you problems with
stop-the-world GC pauses.  A large heap is unfortunately required with a
large index.  The default collector that Java gives you is a *terrible*
choice for large heaps in general and Solr in particular.  Even changing
to the CMS collector may not be enough - more tuning is required.

Thanks,
Shawn



Re: Why is SolrCloud doing a full copy of the index?

2013-05-04 Thread Shawn Heisey
On 5/4/2013 11:45 AM, Shawn Heisey wrote:
 Advance warning: this is a long reply.

I have condensed some relevant performance problem information into the
following wiki page:

http://wiki.apache.org/solr/SolrPerformanceProblems

Anyone who has additional information for this page, feel free to add
it.  I hope I haven't made too many mistakes!

Thanks,
Shawn



Re: Why is SolrCloud doing a full copy of the index?

2013-05-04 Thread Lance Norskog

Great! Thank you very much Shawn.

On 05/04/2013 10:55 AM, Shawn Heisey wrote:

On 5/4/2013 11:45 AM, Shawn Heisey wrote:

Advance warning: this is a long reply.

I have condensed some relevant performance problem information into the
following wiki page:

http://wiki.apache.org/solr/SolrPerformanceProblems

Anyone who has additional information for this page, feel free to add
it.  I hope I haven't made too many mistakes!

Thanks,
Shawn