Re: Performance of never optimizing

Mark Miller Mon, 03 Nov 2008 04:08:06 -0800

Am I missing your benchmark algorithm somewhere? We need it. Somethingdoesn't make sense.


- Mark



Justus Pendleton wrote:

Howdy,
I have a couple of questions regarding some Lucene benchmarking andwhat the results mean[3]. (Skip to the numbered list at the end if youdon't want to read the lengthy exegesis :)
I'm a developer for JIRA[1]. We are currently trying to get a betterunderstanding of Lucene, and our use of it, to cope with the needs ofour larger customers. These "large" indexes are only a couple hundredthousand documents but our problem is compounded by the fact that theyhave a relatively high rate of modification (=delete+insert of newdocument) and our users expect these modification to show up in queryresults pretty much instantly.
Our current default behaviour is a merge factor of 4. We perform anoptimization on the index every 4000 additions. We also perform anoptimize at midnight. Our fundamental problem is that theseoptimizations are locking the index for unacceptably long periods oftime, something that we want to resolve for our next major release,hopefully without undermining search performance too badly.
In the Lucene javadoc there is a comment, and a link to a mailing listdiscussion[2], that suggests applications such as JIRA should neverperform optimize but should instead set their merge factor very low.
In an attempt to understand the impact of a) lowering the merge factorfrom 4 to 2 and b) never, ever optimizing on an index (over the courseof years and millions of additions/updates) I wanted to try tobenchmark Lucene.
I used the contrib/benchmark framework and wrote a small algorithmthat adds documents to an index (using the Reuters doc generator),does a search, does an optimize, then does another search. All thepretty pictures can be seen at:
  http://confluence.atlassian.com/display/JIRACOM/Lucene+graphs
I have several questions, hopefully they aren't overwhelming in theirquantity :-/
1. Why does the merge factor of 4 appear to be faster than the mergefactor of 2?
2. Why does non-optimized searching appear to be faster than optimizedsearching once the index hits ~500,000 documents?
3. There appears to be a fairly sizable performance drop across theboard around 450,000 documents. Why is that?
4. Searching performance appears to decrease towards a fairlypessimistic 20 searches per second (for a relatively simple search).Is this really what we should expect long-term from Lucene?
5. Does my benchmark even make sense? I am far from an expert onbenchmarking so it is possible I'm not measuring what I think I ammeasuring.
Thanks in advance for any insight you can provide. This is an areathat we very much want to understand better as Lucene is a key part ofJIRA's success,
Cheers,
Justus
JIRA Developer

[1]: http://www.atlassian.com
[2]: http://www.gossamer-threads.com/lists/lucene/java-dev/47895
[3]: http://confluence.atlassian.com/display/JIRACOM/Lucene+graphs

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Performance of never optimizing

Reply via email to