[ https://issues.apache.org/jira/browse/LUCENE-966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stanislaw Osinski updated LUCENE-966: ------------------------------------- Attachment: AnalyzerBenchmark.java Here is a very simple benchmark I used to test the performance of StandardAnalyzer, FastAnalyzer and WhitespaceAnalyzer. I ran it on a number of JVMs and got the following results: Input: Reuters collection, the one used by contrib/benchmark, only documents longer than 100 bytes Machine: AMD Sempron 2600+, 2G RAM, Windows XP Sun 1.4.2 Server org.apache.lucene.analysis.standard.StandardAnalyzer: 15172 ms, 139667 tokens/s org.apache.lucene.analysis.fast.FastAnalyzer: 2438 ms, 869170 tokens/s org.apache.lucene.analysis.WhitespaceAnalyzer: 781 ms, 3547585 tokens/s Sun 1.4.2 Client org.apache.lucene.analysis.standard.StandardAnalyzer: 24187 ms, 87610 tokens/s org.apache.lucene.analysis.fast.FastAnalyzer: 3157 ms, 671218 tokens/s org.apache.lucene.analysis.WhitespaceAnalyzer: 1453 ms, 1906857 tokens/s Sun 1.5.0 Server org.apache.lucene.analysis.standard.StandardAnalyzer: 16062 ms, 131928 tokens/s org.apache.lucene.analysis.fast.FastAnalyzer: 2641 ms, 802361 tokens/s org.apache.lucene.analysis.WhitespaceAnalyzer: 750 ms, 3694218 tokens/s Sun 1.5.0 Client org.apache.lucene.analysis.standard.StandardAnalyzer: 23891 ms, 88696 tokens/s org.apache.lucene.analysis.fast.FastAnalyzer: 3641 ms, 581993 tokens/s org.apache.lucene.analysis.WhitespaceAnalyzer: 1437 ms, 1928089 tokens/s Sun 1.6.0 Server org.apache.lucene.analysis.standard.StandardAnalyzer: 13719 ms, 154460 tokens/s org.apache.lucene.analysis.fast.FastAnalyzer: 2484 ms, 853074 tokens/s org.apache.lucene.analysis.WhitespaceAnalyzer: 750 ms, 3694218 tokens/s Sun 1.6.0 Client org.apache.lucene.analysis.standard.StandardAnalyzer: 22312 ms, 94972 tokens/s org.apache.lucene.analysis.fast.FastAnalyzer: 2750 ms, 770558 tokens/s org.apache.lucene.analysis.WhitespaceAnalyzer: 1297 ms, 2136209 tokens/s IBM 1.4.2 org.apache.lucene.analysis.standard.StandardAnalyzer: 11922 ms, 177741 tokens/s org.apache.lucene.analysis.fast.FastAnalyzer: 3218 ms, 658495 tokens/s org.apache.lucene.analysis.WhitespaceAnalyzer: 1407 ms, 1969199 tokens/s IBM 1.5.0 org.apache.lucene.analysis.standard.StandardAnalyzer: 11797 ms, 179625 tokens/s org.apache.lucene.analysis.fast.FastAnalyzer: 2968 ms, 713961 tokens/s org.apache.lucene.analysis.WhitespaceAnalyzer: 1000 ms, 2770664 tokens/s BEA 1.4.2 org.apache.lucene.analysis.standard.StandardAnalyzer: 16234 ms, 130530 tokens/s org.apache.lucene.analysis.fast.FastAnalyzer: 3344 ms, 633683 tokens/s org.apache.lucene.analysis.WhitespaceAnalyzer: 1343 ms, 2063040 tokens/s BEA 1.5.0 (looks really slow) org.apache.lucene.analysis.standard.StandardAnalyzer: 33891 ms, 62525 tokens/s org.apache.lucene.analysis.fast.FastAnalyzer: 12703 ms, 166813 tokens/s org.apache.lucene.analysis.WhitespaceAnalyzer: 4860 ms, 570095 tokens/s > A faster JFlex-based replacement for StandardAnalyzer > ----------------------------------------------------- > > Key: LUCENE-966 > URL: https://issues.apache.org/jira/browse/LUCENE-966 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Stanislaw Osinski > Fix For: 2.3 > > Attachments: AnalyzerBenchmark.java, jflex-analyzer-patch.txt > > > JFlex (http://www.jflex.de/) can be used to generate a faster (up to several > times) replacement for StandardAnalyzer. Will add a patch and a simple > benchmark code in a while. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]