Re: Occasional Solr performance issues

2012-10-29 Thread Dotan Cohen
On Mon, Oct 29, 2012 at 7:04 AM, Shawn Heisey s...@elyograg.org wrote:
 They are indeed Java options.  The first two control the maximum and
 starting heap sizes.  NewRatio controls the relative size of the young and
 old generations, making the young generation considerably larger than it is
 by default.  The others are garbage collector options.  This seems to be a
 good summary:

 http://www.petefreitag.com/articles/gctuning/

 Here's the official Sun (Oracle) documentation on GC tuning:

 http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html


Thank you Shawn! Those are exactly the documents that I need. Google
should hire you to fill in the pages when someone searches for java
garbage collection. Interestingly, I just check and bing.com does
list the Oracle page on the first pager of results. I shudder to think
that I might have to switch search engines!

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-28 Thread Dotan Cohen
On Fri, Oct 26, 2012 at 11:04 PM, Shawn Heisey s...@elyograg.org wrote:
 Warming doesn't seem to be a problem here -- all your warm times are zero,
 so I am going to take a guess that it may be a heap/GC issue.  I would
 recommend starting with the following additional arguments to your JVM.
 Since I have no idea how solr gets started on your server, I don't know
 where you would add these:

 -Xmx4096M -Xms4096M -XX:NewRatio=1 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
 -XX:+CMSParallelRemarkEnabled


Thanks. I've added those flags to the Solr line that I use to start
Solr. Those are Java flags, not Solr, correct? I'm googling the flags
now, but I find it interesting that I cannot find a canonical
reference for them.


 This allocates 4GB of RAM to java, sets up a larger than normal Eden space
 in the heap, and uses garbage collection options that usually fare better in
 a server environment than the default.Java memory management options are
 like religion to some people ... I may start a flamewar with these
 recommendations. ;)  The best I can tell you about these choices: They made
 a big difference for me.


Thanks. I will experiment with them empirically. The first step is to
learn to read the debug info, though. I've been googing for days, but
I must be missing something. Where is the information that I pasted in
pastebin documented?


 I would also recommend switching to a Sun/Oracle jvm.  I have heard that
 previous versions of Solr were not happy on variants like OpenJDK, I have no
 idea whether that might still be the case with 4.0.  If you choose to do
 this, you probably have package choices in Ubuntu.  I know that in Debian,
 the package is called sun-java6-jre ... Ubuntu is probably something
 similar. Debian has a CLI command 'update-java-alternatives' that will
 quickly switch between different java implementations that are installed.
 Hopefully Ubuntu also has this.  If not, you might need the following
 command instead to switch the main java executable:

 update-alternatives --config java


Thanks, I will take a look at the current Oracle JVM.


-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-28 Thread Shawn Heisey

On 10/28/2012 2:28 PM, Dotan Cohen wrote:

On Fri, Oct 26, 2012 at 11:04 PM, Shawn Heisey s...@elyograg.org wrote:

Warming doesn't seem to be a problem here -- all your warm times are zero,
so I am going to take a guess that it may be a heap/GC issue.  I would
recommend starting with the following additional arguments to your JVM.
Since I have no idea how solr gets started on your server, I don't know
where you would add these:

-Xmx4096M -Xms4096M -XX:NewRatio=1 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled

Thanks. I've added those flags to the Solr line that I use to start
Solr. Those are Java flags, not Solr, correct? I'm googling the flags
now, but I find it interesting that I cannot find a canonical
reference for them.


They are indeed Java options.  The first two control the maximum and 
starting heap sizes.  NewRatio controls the relative size of the young 
and old generations, making the young generation considerably larger 
than it is by default.  The others are garbage collector options.  This 
seems to be a good summary:


http://www.petefreitag.com/articles/gctuning/

Here's the official Sun (Oracle) documentation on GC tuning:

http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html

Thanks,
Shawn



Re: Occasional Solr performance issues

2012-10-26 Thread Dotan Cohen
On Wed, Oct 24, 2012 at 4:33 PM, Walter Underwood wun...@wunderwood.org wrote:
 Please consider never running optimize. That should be called force merge.


Thanks. I have been letting the system run for about two days already
without an optimize. I will let it run a week, then merge to see the
effect.

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-26 Thread Dotan Cohen
I spoke too soon! Wereas three days ago when the index was new 500
records could be written to it in 3 seconds, now that operation is
taking a minute and a half, sometimes longer. I ran optimize() but
that did not help the writes. What can I do to improve the write
performance?

Even opening the Logging tab of the Solr instance is taking quite a
long time. In fact, I just left it for 20 minutes and it still hasn't
come back with anything. I do have an SSH window open on the server
hosting Solr and it doesn't look overloaded at all:

$ date  du -sh data/  uptime  free -m
Fri Oct 26 13:15:59 UTC 2012
578Mdata/
 13:15:59 up 4 days, 17:59,  1 user,  load average: 0.06, 0.12, 0.22
 total   used   free sharedbuffers cached
Mem: 14980   3237  11743  0284   
-/+ buffers/cache:729  14250
Swap:0  0  0


-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-26 Thread Shawn Heisey

On 10/26/2012 7:16 AM, Dotan Cohen wrote:

I spoke too soon! Wereas three days ago when the index was new 500
records could be written to it in 3 seconds, now that operation is
taking a minute and a half, sometimes longer. I ran optimize() but
that did not help the writes. What can I do to improve the write
performance?

Even opening the Logging tab of the Solr instance is taking quite a
long time. In fact, I just left it for 20 minutes and it still hasn't
come back with anything. I do have an SSH window open on the server
hosting Solr and it doesn't look overloaded at all:

$ date  du -sh data/  uptime  free -m
Fri Oct 26 13:15:59 UTC 2012
578Mdata/
  13:15:59 up 4 days, 17:59,  1 user,  load average: 0.06, 0.12, 0.22
  total   used   free sharedbuffers cached
Mem: 14980   3237  11743  0284   
-/+ buffers/cache:729  14250
Swap:0  0  0


Taking all the information I've seen so far, my bet is on either cache 
warming or heap/GC trouble as the source of your problem.  It's now 
specific information gathering time.  Can you gather all the following 
information and put it into a web paste page, such as pastie.org, and 
reply with the link?  I have gathered the same information from my test 
server and created a pastie example. http://pastie.org/5118979


On the dashboard of the GUI, it lists all the jvm arguments. Include those.

Click Java Properties and gather the java.runtime.version and 
java.specification.vendor information.


After one of the long update times, pause/stop your indexing 
application.  Click on your core in the GUI, open Plugins/Stats, and 
paste the following bits with a header to indicate what each section is:

CACHE-filterCache
CACHE-queryResultCache
CORE-searcher

Thanks,
Shawn



Re: Occasional Solr performance issues

2012-10-26 Thread Dotan Cohen
On Fri, Oct 26, 2012 at 4:02 PM, Shawn Heisey s...@elyograg.org wrote:

 Taking all the information I've seen so far, my bet is on either cache
 warming or heap/GC trouble as the source of your problem.  It's now specific
 information gathering time.  Can you gather all the following information
 and put it into a web paste page, such as pastie.org, and reply with the
 link?  I have gathered the same information from my test server and created
 a pastie example. http://pastie.org/5118979

 On the dashboard of the GUI, it lists all the jvm arguments. Include those.

 Click Java Properties and gather the java.runtime.version and
 java.specification.vendor information.

 After one of the long update times, pause/stop your indexing application.
 Click on your core in the GUI, open Plugins/Stats, and paste the following
 bits with a header to indicate what each section is:
 CACHE-filterCache
 CACHE-queryResultCache
 CORE-searcher

 Thanks,
 Shawn


Thank you Shawn. The information is here:
http://pastebin.com/aqEfeYVA

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-26 Thread Shawn Heisey

On 10/26/2012 9:41 AM, Dotan Cohen wrote:

On the dashboard of the GUI, it lists all the jvm arguments. Include those.

Click Java Properties and gather the java.runtime.version and
java.specification.vendor information.

After one of the long update times, pause/stop your indexing application.
Click on your core in the GUI, open Plugins/Stats, and paste the following
bits with a header to indicate what each section is:
CACHE-filterCache
CACHE-queryResultCache
CORE-searcher

Thanks,
Shawn

Thank you Shawn. The information is here:
http://pastebin.com/aqEfeYVA



Warming doesn't seem to be a problem here -- all your warm times are 
zero, so I am going to take a guess that it may be a heap/GC issue.  I 
would recommend starting with the following additional arguments to your 
JVM.  Since I have no idea how solr gets started on your server, I don't 
know where you would add these:


-Xmx4096M -Xms4096M -XX:NewRatio=1 -XX:+UseParNewGC 
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled


This allocates 4GB of RAM to java, sets up a larger than normal Eden 
space in the heap, and uses garbage collection options that usually fare 
better in a server environment than the default.Java memory management 
options are like religion to some people ... I may start a flamewar with 
these recommendations. ;)  The best I can tell you about these choices: 
They made a big difference for me.


I would also recommend switching to a Sun/Oracle jvm.  I have heard that 
previous versions of Solr were not happy on variants like OpenJDK, I 
have no idea whether that might still be the case with 4.0.  If you 
choose to do this, you probably have package choices in Ubuntu.  I know 
that in Debian, the package is called sun-java6-jre ... Ubuntu is 
probably something similar. Debian has a CLI command 
'update-java-alternatives' that will quickly switch between different 
java implementations that are installed.  Hopefully Ubuntu also has 
this.  If not, you might need the following command instead to switch 
the main java executable:


update-alternatives --config java

Thanks,
Shawn



Re: Occasional Solr performance issues

2012-10-24 Thread Dotan Cohen
On Tue, Oct 23, 2012 at 3:07 PM, Erick Erickson erickerick...@gmail.com wrote:
 Maybe you've been looking at it but one thing that I didn't see on a fast
 scan was that maybe the commit bit is the problem. When you commit,
 eventually the segments will be merged and a new searcher will be opened
 (this is true even if you're NOT optimizing). So you're effectively committing
 every 1-2 seconds, creating many segments which get merged, but more
 importantly opening new searchers (which you are getting since you pasted
 the message: Overlapping onDeckSearchers=2).

 You could pinpoint this by NOT committing explicitly, just set your autocommit
 parameters (or specify commitWithin in your indexing program, which is
 preferred). Try setting it at a minute or so and see if your problem goes away
 perhaps?

 The NRT stuff happens on soft commits, so you have that option to have the
 documents immediately available for search.



Thanks, Erick. I'll play around with different configurations. So far
just removing the periodic optimize command worked wonders. I'll see
how much it helps or hurts to run that daily or more or less frequent.


-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-24 Thread Walter Underwood
Please consider never running optimize. That should be called force merge. 

wunder

On Oct 24, 2012, at 3:28 AM, Dotan Cohen wrote:

 On Tue, Oct 23, 2012 at 3:07 PM, Erick Erickson erickerick...@gmail.com 
 wrote:
 Maybe you've been looking at it but one thing that I didn't see on a fast
 scan was that maybe the commit bit is the problem. When you commit,
 eventually the segments will be merged and a new searcher will be opened
 (this is true even if you're NOT optimizing). So you're effectively 
 committing
 every 1-2 seconds, creating many segments which get merged, but more
 importantly opening new searchers (which you are getting since you pasted
 the message: Overlapping onDeckSearchers=2).
 
 You could pinpoint this by NOT committing explicitly, just set your 
 autocommit
 parameters (or specify commitWithin in your indexing program, which is
 preferred). Try setting it at a minute or so and see if your problem goes 
 away
 perhaps?
 
 The NRT stuff happens on soft commits, so you have that option to have the
 documents immediately available for search.
 
 
 
 Thanks, Erick. I'll play around with different configurations. So far
 just removing the periodic optimize command worked wonders. I'll see
 how much it helps or hurts to run that daily or more or less frequent.
 
 
 -- 
 Dotan Cohen
 
 http://gibberish.co.il
 http://what-is-what.com






Re: Occasional Solr performance issues

2012-10-23 Thread Erick Erickson
Maybe you've been looking at it but one thing that I didn't see on a fast
scan was that maybe the commit bit is the problem. When you commit,
eventually the segments will be merged and a new searcher will be opened
(this is true even if you're NOT optimizing). So you're effectively committing
every 1-2 seconds, creating many segments which get merged, but more
importantly opening new searchers (which you are getting since you pasted
the message: Overlapping onDeckSearchers=2).

You could pinpoint this by NOT committing explicitly, just set your autocommit
parameters (or specify commitWithin in your indexing program, which is
preferred). Try setting it at a minute or so and see if your problem goes away
perhaps?

The NRT stuff happens on soft commits, so you have that option to have the
documents immediately available for search.

Best
Erick

On Mon, Oct 22, 2012 at 10:44 AM, Dotan Cohen dotanco...@gmail.com wrote:
 I've got a script writing ~50 documents to Solr at a time, then
 commiting. Each of these documents is no longer than 1 KiB of text,
 some much less. Usually the write-and-commit will take 1-2 seconds or
 less, but sometimes it can go over 60 seconds.

 During a recent time of over-60-second write-and-commits, I saw that
 the server did not look overloaded:

 $ uptime
  14:36:46 up 19:20,  1 user,  load average: 1.08, 1.16, 1.16
 $ free -m
  total   used   free sharedbuffers cached
 Mem: 14980   2091  12889  0233   1243
 -/+ buffers/cache:613  14366
 Swap:0  0  0

 Other than Solr, nothing is running on this machine other than stock
 Ubuntu Server services (no Apache, no MySQL). The machine is running
 on an Extra Large Amazon EC2 instance, with a virtual 4-core 2.4 GHz
 Xeon processor and ~16 GiB of RAM. The solr home is on a mounted EBS
 volume.

 What might make some queries take so long, while others perform fine?

 Thanks.


 --
 Dotan Cohen

 http://gibberish.co.il
 http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-22 Thread Dotan Cohen
When Solr is slow, I'm seeing these in the logs:
[collection1] Error opening new searcher. exceeded limit of
maxWarmingSearchers=2,​ try again later.
[collection1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2

Googling, I found this in the FAQ:
Typically the way to avoid this error is to either reduce the
frequency of commits, or reduce the amount of warming a searcher does
while it's on deck (by reducing the work in newSearcher listeners,
and/or reducing the autowarmCount on your caches)
http://wiki.apache.org/solr/FAQ#What_does_.22PERFORMANCE_WARNING:_Overlapping_onDeckSearchers.3DX.22_mean_in_my_logs.3F

I happen to know that the script will try to commit once every 60
seconds. How does one reduce the work in newSearcher listeners? What
effect will this have? What effect will reducing the autowarmCount on
caches have?

Thanks.

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-22 Thread Rafał Kuć
Hello!

You can check if the long warming is causing the overlapping
searchers. Check Solr admin panel and look at cache statistics, there
should be warmupTime property.

Lowering the autowarmCount should lower the time needed to warm up,
howere you can also look at your warming queries (if you have such)
and see how long they take.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 When Solr is slow, I'm seeing these in the logs:
 [collection1] Error opening new searcher. exceeded limit of
 maxWarmingSearchers=2,​ try again later.
 [collection1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2

 Googling, I found this in the FAQ:
 Typically the way to avoid this error is to either reduce the
 frequency of commits, or reduce the amount of warming a searcher does
 while it's on deck (by reducing the work in newSearcher listeners,
 and/or reducing the autowarmCount on your caches)
 http://wiki.apache.org/solr/FAQ#What_does_.22PERFORMANCE_WARNING:_Overlapping_onDeckSearchers.3DX.22_mean_in_my_logs.3F

 I happen to know that the script will try to commit once every 60
 seconds. How does one reduce the work in newSearcher listeners? What
 effect will this have? What effect will reducing the autowarmCount on
 caches have?

 Thanks.



Re: Occasional Solr performance issues

2012-10-22 Thread Mark Miller
Are you using Solr 3X? The occasional long commit should no longer
show up in Solr 4.

- Mark

On Mon, Oct 22, 2012 at 10:44 AM, Dotan Cohen dotanco...@gmail.com wrote:
 I've got a script writing ~50 documents to Solr at a time, then
 commiting. Each of these documents is no longer than 1 KiB of text,
 some much less. Usually the write-and-commit will take 1-2 seconds or
 less, but sometimes it can go over 60 seconds.

 During a recent time of over-60-second write-and-commits, I saw that
 the server did not look overloaded:

 $ uptime
  14:36:46 up 19:20,  1 user,  load average: 1.08, 1.16, 1.16
 $ free -m
  total   used   free sharedbuffers cached
 Mem: 14980   2091  12889  0233   1243
 -/+ buffers/cache:613  14366
 Swap:0  0  0

 Other than Solr, nothing is running on this machine other than stock
 Ubuntu Server services (no Apache, no MySQL). The machine is running
 on an Extra Large Amazon EC2 instance, with a virtual 4-core 2.4 GHz
 Xeon processor and ~16 GiB of RAM. The solr home is on a mounted EBS
 volume.

 What might make some queries take so long, while others perform fine?

 Thanks.


 --
 Dotan Cohen

 http://gibberish.co.il
 http://what-is-what.com



-- 
- Mark


Re: Occasional Solr performance issues

2012-10-22 Thread Dotan Cohen
On Mon, Oct 22, 2012 at 5:02 PM, Rafał Kuć r@solr.pl wrote:
 Hello!

 You can check if the long warming is causing the overlapping
 searchers. Check Solr admin panel and look at cache statistics, there
 should be warmupTime property.


Thank you, I have gone over the Solr admin panel twice and I cannot
find the cache statistics. Where are they?


 Lowering the autowarmCount should lower the time needed to warm up,
 howere you can also look at your warming queries (if you have such)
 and see how long they take.


Thank you, I will look at that!

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-22 Thread Dotan Cohen
On Mon, Oct 22, 2012 at 5:27 PM, Mark Miller markrmil...@gmail.com wrote:
 Are you using Solr 3X? The occasional long commit should no longer
 show up in Solr 4.


Thank you Mark. In fact, this is the production release of Solr 4.


-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-22 Thread Shawn Heisey

On 10/22/2012 9:58 AM, Dotan Cohen wrote:
Thank you, I have gone over the Solr admin panel twice and I cannot 
find the cache statistics. Where are they?


If you are running Solr4, you can see individual cache autowarming times 
here, assuming your core is named collection1:


http://server:port/solr/#/collection1/plugins/cache?entry=queryResultCache
http://server:port/solr/#/collection1/plugins/cache?entry=filterCache

The warmup time for the entire searcher can be found here:

http://server:port/solr/#/collection1/plugins/core?entry=searcher


If you are on an older Solr release, everything is in various sections 
of the stats page.  Do a page search for warmup multiple times to see 
them all:


http://server:port/solr/corename/admin/stats.jsp

Thanks,
Shawn



Re: Occasional Solr performance issues

2012-10-22 Thread Dotan Cohen
On Mon, Oct 22, 2012 at 7:29 PM, Shawn Heisey s...@elyograg.org wrote:
 On 10/22/2012 9:58 AM, Dotan Cohen wrote:

 Thank you, I have gone over the Solr admin panel twice and I cannot find
 the cache statistics. Where are they?


 If you are running Solr4, you can see individual cache autowarming times
 here, assuming your core is named collection1:

 http://server:port/solr/#/collection1/plugins/cache?entry=queryResultCache
 http://server:port/solr/#/collection1/plugins/cache?entry=filterCache

 The warmup time for the entire searcher can be found here:

 http://server:port/solr/#/collection1/plugins/core?entry=searcher



Thank you Shawn! I can see how I missed that data. I'm reviewing it
now. Solr has a low barrier to entry, but quite a learning curve. I'm
loving it!

I see that the server is using less than 2 GiB of memory, whereas it
is a dedicated Solr server with 16 GiB of memory. I understand that I
can increase the query and document caches to increase performance,
but I worry that this will increase the warm-up time to unacceptable
levels. What is a good strategy for increasing the caches yet
preserving performance after an optimize operation?

Thanks.

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-22 Thread Mark Miller
Perhaps you can grab a snapshot of the stack traces when the 60 second
delay is occurring?

You can get the stack traces right in the admin ui, or you can use
another tool (jconsole, visualvm, jstack cmd line, etc)

- Mark

On Mon, Oct 22, 2012 at 1:47 PM, Dotan Cohen dotanco...@gmail.com wrote:
 On Mon, Oct 22, 2012 at 7:29 PM, Shawn Heisey s...@elyograg.org wrote:
 On 10/22/2012 9:58 AM, Dotan Cohen wrote:

 Thank you, I have gone over the Solr admin panel twice and I cannot find
 the cache statistics. Where are they?


 If you are running Solr4, you can see individual cache autowarming times
 here, assuming your core is named collection1:

 http://server:port/solr/#/collection1/plugins/cache?entry=queryResultCache
 http://server:port/solr/#/collection1/plugins/cache?entry=filterCache

 The warmup time for the entire searcher can be found here:

 http://server:port/solr/#/collection1/plugins/core?entry=searcher



 Thank you Shawn! I can see how I missed that data. I'm reviewing it
 now. Solr has a low barrier to entry, but quite a learning curve. I'm
 loving it!

 I see that the server is using less than 2 GiB of memory, whereas it
 is a dedicated Solr server with 16 GiB of memory. I understand that I
 can increase the query and document caches to increase performance,
 but I worry that this will increase the warm-up time to unacceptable
 levels. What is a good strategy for increasing the caches yet
 preserving performance after an optimize operation?

 Thanks.

 --
 Dotan Cohen

 http://gibberish.co.il
 http://what-is-what.com



-- 
- Mark


Re: Occasional Solr performance issues

2012-10-22 Thread Dotan Cohen
On Mon, Oct 22, 2012 at 9:22 PM, Mark Miller markrmil...@gmail.com wrote:
 Perhaps you can grab a snapshot of the stack traces when the 60 second
 delay is occurring?

 You can get the stack traces right in the admin ui, or you can use
 another tool (jconsole, visualvm, jstack cmd line, etc)

Thanks. I've refactored so that the index is optimized once per hour,
instead after each dump of commits. But when I will need to increase
the optmize frequency in the future I will go through the stack
traces. Thanks!

In any case, the server has an extra 14 GiB of memory available, how
might I make the best use of that for Solr assuming both heavy reads
and writes?

Thanks.

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-22 Thread Walter Underwood
First, stop optimizing. You do not need to manually force merges. The system 
does a great job. Forcing merges (optimize) uses a lot of CPU and disk IO and 
might be the cause of your problem.

Second, the OS will use the extra memory for file buffers, which really helps 
performance, so you might not need to do anything. This will work better after 
you stop forcing merges. A forced merge replaces every file, so the OS needs to 
reload everything into file buffers.

wunder

On Oct 22, 2012, at 12:55 PM, Dotan Cohen wrote:

 On Mon, Oct 22, 2012 at 9:22 PM, Mark Miller markrmil...@gmail.com wrote:
 Perhaps you can grab a snapshot of the stack traces when the 60 second
 delay is occurring?
 
 You can get the stack traces right in the admin ui, or you can use
 another tool (jconsole, visualvm, jstack cmd line, etc)
 
 Thanks. I've refactored so that the index is optimized once per hour,
 instead after each dump of commits. But when I will need to increase
 the optmize frequency in the future I will go through the stack
 traces. Thanks!
 
 In any case, the server has an extra 14 GiB of memory available, how
 might I make the best use of that for Solr assuming both heavy reads
 and writes?
 
 Thanks.
 
 -- 
 Dotan Cohen
 
 http://gibberish.co.il
 http://what-is-what.com






Re: Occasional Solr performance issues

2012-10-22 Thread Michael Della Bitta
Has the Solr team considered renaming the optimize function to avoid
leading people down the path of this antipattern?

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Mon, Oct 22, 2012 at 4:01 PM, Walter Underwood wun...@wunderwood.org wrote:
 First, stop optimizing. You do not need to manually force merges. The system 
 does a great job. Forcing merges (optimize) uses a lot of CPU and disk IO and 
 might be the cause of your problem.

 Second, the OS will use the extra memory for file buffers, which really 
 helps performance, so you might not need to do anything. This will work 
 better after you stop forcing merges. A forced merge replaces every file, so 
 the OS needs to reload everything into file buffers.

 wunder

 On Oct 22, 2012, at 12:55 PM, Dotan Cohen wrote:

 On Mon, Oct 22, 2012 at 9:22 PM, Mark Miller markrmil...@gmail.com wrote:
 Perhaps you can grab a snapshot of the stack traces when the 60 second
 delay is occurring?

 You can get the stack traces right in the admin ui, or you can use
 another tool (jconsole, visualvm, jstack cmd line, etc)

 Thanks. I've refactored so that the index is optimized once per hour,
 instead after each dump of commits. But when I will need to increase
 the optmize frequency in the future I will go through the stack
 traces. Thanks!

 In any case, the server has an extra 14 GiB of memory available, how
 might I make the best use of that for Solr assuming both heavy reads
 and writes?

 Thanks.

 --
 Dotan Cohen

 http://gibberish.co.il
 http://what-is-what.com






Re: Occasional Solr performance issues

2012-10-22 Thread Walter Underwood
Lucene already did that:

https://issues.apache.org/jira/browse/LUCENE-3454

Here is the Solr issue:

https://issues.apache.org/jira/browse/SOLR-3141

People over-use this regardless of the name. In Ultraseek Server, it was called 
force merge and we had to tell people to stop doing that nearly every month.

wunder

On Oct 22, 2012, at 1:39 PM, Michael Della Bitta wrote:

 Has the Solr team considered renaming the optimize function to avoid
 leading people down the path of this antipattern?
 
 Michael Della Bitta
 
 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271
 
 www.appinions.com
 
 Where Influence Isn’t a Game
 
 
 On Mon, Oct 22, 2012 at 4:01 PM, Walter Underwood wun...@wunderwood.org 
 wrote:
 First, stop optimizing. You do not need to manually force merges. The system 
 does a great job. Forcing merges (optimize) uses a lot of CPU and disk IO 
 and might be the cause of your problem.
 
 Second, the OS will use the extra memory for file buffers, which really 
 helps performance, so you might not need to do anything. This will work 
 better after you stop forcing merges. A forced merge replaces every file, so 
 the OS needs to reload everything into file buffers.
 
 wunder
 
 On Oct 22, 2012, at 12:55 PM, Dotan Cohen wrote:
 
 On Mon, Oct 22, 2012 at 9:22 PM, Mark Miller markrmil...@gmail.com wrote:
 Perhaps you can grab a snapshot of the stack traces when the 60 second
 delay is occurring?
 
 You can get the stack traces right in the admin ui, or you can use
 another tool (jconsole, visualvm, jstack cmd line, etc)
 
 Thanks. I've refactored so that the index is optimized once per hour,
 instead after each dump of commits. But when I will need to increase
 the optmize frequency in the future I will go through the stack
 traces. Thanks!
 
 In any case, the server has an extra 14 GiB of memory available, how
 might I make the best use of that for Solr assuming both heavy reads
 and writes?
 
 Thanks.
 
 --
 Dotan Cohen
 
 http://gibberish.co.il
 http://what-is-what.com
 
 
 
 

--
Walter Underwood
wun...@wunderwood.org





Re: Occasional Solr performance issues

2012-10-22 Thread Yonik Seeley
On Mon, Oct 22, 2012 at 4:39 PM, Michael Della Bitta
michael.della.bi...@appinions.com wrote:
 Has the Solr team considered renaming the optimize function to avoid
 leading people down the path of this antipattern?

If it were never the right thing to do, it could simply be removed.
The problem is that it's sometimes the right thing to do - but it
depends heavily on the use cases and trade-offs.  The best thing is to
simply document what it does and the cost of doing it.

-Yonik
http://lucidworks.com


Re: Occasional Solr performance issues

2012-10-22 Thread Dotan Cohen
On Mon, Oct 22, 2012 at 10:01 PM, Walter Underwood
wun...@wunderwood.org wrote:
 First, stop optimizing. You do not need to manually force merges. The system 
 does a great job. Forcing merges (optimize) uses a lot of CPU and disk IO and 
 might be the cause of your problem.


Thanks. Looking at the index statistics, I see that within minutes
after running optimize that the stats say the index needs to be
reoptimized. Though, the index still reads and writes fine even in
that state.


 Second, the OS will use the extra memory for file buffers, which really 
 helps performance, so you might not need to do anything. This will work 
 better after you stop forcing merges. A forced merge replaces every file, so 
 the OS needs to reload everything into file buffers.


I don't see that the memory is being used:

$ free -g
 total   used   free sharedbuffers cached
Mem:14  2 12  0  0  1
-/+ buffers/cache:  0 14
Swap:0  0  0

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-22 Thread Dotan Cohen
On Mon, Oct 22, 2012 at 10:44 PM, Walter Underwood
wun...@wunderwood.org wrote:
 Lucene already did that:

 https://issues.apache.org/jira/browse/LUCENE-3454

 Here is the Solr issue:

 https://issues.apache.org/jira/browse/SOLR-3141

 People over-use this regardless of the name. In Ultraseek Server, it was 
 called force merge and we had to tell people to stop doing that nearly 
 every month.


Thank you for those links. I commented on the Solr bug. There are some
very insightful comments in there.


-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-22 Thread Shawn Heisey

On 10/22/2012 3:11 PM, Dotan Cohen wrote:

On Mon, Oct 22, 2012 at 10:01 PM, Walter Underwood
wun...@wunderwood.org wrote:

First, stop optimizing. You do not need to manually force merges. The system 
does a great job. Forcing merges (optimize) uses a lot of CPU and disk IO and 
might be the cause of your problem.


Thanks. Looking at the index statistics, I see that within minutes
after running optimize that the stats say the index needs to be
reoptimized. Though, the index still reads and writes fine even in
that state.


As soon as you make any change at all to an index, it's no longer 
optimized.  Delete one document, add one document, anything.  Most of 
the time you will not see a performance increase from optimizing an 
index that consists of one large segment and a bunch of very tiny 
segments or deleted documents.



Second, the OS will use the extra memory for file buffers, which really helps 
performance, so you might not need to do anything. This will work better after you stop 
forcing merges. A forced merge replaces every file, so the OS needs to reload everything 
into file buffers.


I don't see that the memory is being used:

$ free -g
  total   used   free sharedbuffers cached
Mem:14  2 12  0  0  1
-/+ buffers/cache:  0 14
Swap:0  0  0


How big is your index, and did you run this right after a reboot?  If 
you did, then the cache will be fairly empty, and Solr has only read 
enough from the index files to open the searcher.The number is probably 
too small to show up on a gigabyte scale.  As you issue queries, the 
cached amount will get bigger.  If your index is small enough to fit in 
the 14GB of free RAM that you have, you can manually populate the disk 
cache by going to your index directory and doing 'cat *  /dev/null' 
from the commandline or a script.  The first time you do it, it may go 
slowly, but if you immediately do it again, it will complete VERY fast 
-- the data will all be in RAM.


The 'free -m' command in your first email shows cache usage of 1243MB, 
which suggests that maybe your index is considerably smaller than your 
available RAM.  Having loads of free RAM is a good thing for just about 
any workload, but especially for Solr.Try running the free command 
without the -g so you can see those numbers in kilobytes.


I have seen a tendency towards creating huge caches in Solr because 
people have lots of memory.  It's important to realize that the OS is 
far better at the overall job of caching the index files than Solr 
itself is.  Solr caches are meant to cache result sets from queries and 
filters, not large sections of the actual index contents.  Make the 
caches big enough that you see some benefit, but not big enough to suck 
up all your RAM.


If you are having warm time problems, make the autowarm counts low.  I 
have run into problems with warming on my filter cache, because we have 
filters that are extremely hairy and slow to run. I had to reduce my 
autowarm count on the filter cache to FOUR, with a cache size of 512.  
When it is 8 or higher, it can take over a minute to autowarm.


Thanks,
Shawn



Re: Occasional Solr performance issues

2012-10-22 Thread Dotan Cohen
On Tue, Oct 23, 2012 at 3:52 AM, Shawn Heisey s...@elyograg.org wrote:
 As soon as you make any change at all to an index, it's no longer
 optimized.  Delete one document, add one document, anything.  Most of the
 time you will not see a performance increase from optimizing an index that
 consists of one large segment and a bunch of very tiny segments or deleted
 documents.


I've since realized that by experimentation. I've probably saved quite
a few minutes of reading time by investing hours of experiment time!


 How big is your index, and did you run this right after a reboot?  If you
 did, then the cache will be fairly empty, and Solr has only read enough from
 the index files to open the searcher.The number is probably too small to
 show up on a gigabyte scale.  As you issue queries, the cached amount will
 get bigger.  If your index is small enough to fit in the 14GB of free RAM
 that you have, you can manually populate the disk cache by going to your
 index directory and doing 'cat *  /dev/null' from the commandline or a
 script.  The first time you do it, it may go slowly, but if you immediately
 do it again, it will complete VERY fast -- the data will all be in RAM.


The cat trick to get the files in RAM is great. I would not have
thought that would work for binary files.

The index is small, much less than the available RAM, for the time
being. Therefore, there was nothing to fill it with I now understand.
Both 'free' outputs were after the system had been running for some
time.


 The 'free -m' command in your first email shows cache usage of 1243MB, which
 suggests that maybe your index is considerably smaller than your available
 RAM.  Having loads of free RAM is a good thing for just about any workload,
 but especially for Solr.Try running the free command without the -g so you
 can see those numbers in kilobytes.

 I have seen a tendency towards creating huge caches in Solr because people
 have lots of memory.  It's important to realize that the OS is far better at
 the overall job of caching the index files than Solr itself is.  Solr caches
 are meant to cache result sets from queries and filters, not large sections
 of the actual index contents.  Make the caches big enough that you see some
 benefit, but not big enough to suck up all your RAM.


I see, thanks.


 If you are having warm time problems, make the autowarm counts low.  I have
 run into problems with warming on my filter cache, because we have filters
 that are extremely hairy and slow to run. I had to reduce my autowarm count
 on the filter cache to FOUR, with a cache size of 512.  When it is 8 or
 higher, it can take over a minute to autowarm.


I will have to experiment with the warning. Thank you for the tips.


-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com