[ https://issues.apache.org/jira/browse/LUCENE-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829163#action_12829163 ]
Fuad Efendi commented on LUCENE-2230: ------------------------------------- After long-run load-stress tests... I used 2 boxes, one with SOLR, another one with simple multithreaded stress simulator (with randomply generated fuzzy query samples); each box is 2x AMD Opteron 2350 (8 core per box); 64-bit. I disabled all SOLR caches except Document Cache (I want isolated tests; I want to ignore time taken by disk I/O to load document). Performance boosted accordingly to number of load-stress threads (on "client" computer), then dropped: 9 Threads: ========== TPS: 200 - 210 Response: 45 - 50 (ms) 10 Threads: =========== TPS: 200 - 215 Response: 45 - 55 (ms) 12 Threads: =========== TPS: 180 - 220 Response: 50 - 90 (ms) 16 Threads: =========== TPS: 60 - 65 Response: 230 - 260 (ms) It can be explained by CPU-bound processing and 8 cores available; "top" command on SOLR instance was shown 750% - 790% CPU time (8-core) on 3rd step (12 stressing threads), and 200% on 4th step (16 stressing threads) - due probably to Network I/O, Tomcat internals, etc. It's better to have Apache HTTPD in front of SOLR in production, with proxy_ajp (persistent connections) and HTTP caching enabled; and fine-tune Tomcat threads according to use case. BTW, my best counters for default SOLR/Lucene were: TPS: 12 Response: 750ms "Fuzzy" queries were tuned such a way that distance threshold was less than or equal two. I used "StrikeAMatch" distance... Thanks, http://www.tokenizer.ca +1 416-993-2060(cell) > Lucene Fuzzy Search: BK-Tree can improve performance 3-20 times. > ---------------------------------------------------------------- > > Key: LUCENE-2230 > URL: https://issues.apache.org/jira/browse/LUCENE-2230 > Project: Lucene - Java > Issue Type: Improvement > Affects Versions: 3.0 > Environment: Lucene currently uses brute force full-terms scanner and > calculates distance for each term. New BKTree structure improves performance > in average 20 times when distance is 1, and 3 times when distance is 3. I > tested with index size several millions docs, and 250,000 terms. > New algo uses integer distances between objects. > Reporter: Fuad Efendi > Attachments: BKTree.java, Distance.java, DistanceImpl.java, > FuzzyTermEnumNEW.java, FuzzyTermEnumNEW.java > > Original Estimate: 0.02h > Remaining Estimate: 0.02h > > W. Burkhard and R. Keller. Some approaches to best-match file searching, > CACM, 1973 > http://portal.acm.org/citation.cfm?doid=362003.362025 > I was inspired by > http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees (Nick > Johnson, Google). > Additionally, simplified algorythm at > http://www.catalysoft.com/articles/StrikeAMatch.html seems to be much more > logically correct than Levenstein distance, and it is 3-5 times faster > (isolated tests). > Big list od distance implementations: > http://www.dcs.shef.ac.uk/~sam/stringmetrics.htm -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org