[ 
https://issues.apache.org/jira/browse/LUCENE-966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stanislaw Osinski updated LUCENE-966:
-------------------------------------

    Attachment: AnalyzerBenchmark.java

Here is a very simple benchmark I used to test the performance of 
StandardAnalyzer, FastAnalyzer and WhitespaceAnalyzer. I ran it on a number of 
JVMs and got the following results:

Input: Reuters collection, the one used by contrib/benchmark, only documents
longer than 100 bytes

Machine: AMD Sempron 2600+, 2G RAM, Windows XP

Sun 1.4.2 Server
org.apache.lucene.analysis.standard.StandardAnalyzer: 15172 ms, 139667 tokens/s
org.apache.lucene.analysis.fast.FastAnalyzer: 2438 ms, 869170 tokens/s
org.apache.lucene.analysis.WhitespaceAnalyzer: 781 ms, 3547585 tokens/s

Sun 1.4.2 Client
org.apache.lucene.analysis.standard.StandardAnalyzer: 24187 ms, 87610 tokens/s
org.apache.lucene.analysis.fast.FastAnalyzer: 3157 ms, 671218 tokens/s
org.apache.lucene.analysis.WhitespaceAnalyzer: 1453 ms, 1906857 tokens/s

Sun 1.5.0 Server
org.apache.lucene.analysis.standard.StandardAnalyzer: 16062 ms, 131928 tokens/s
org.apache.lucene.analysis.fast.FastAnalyzer: 2641 ms, 802361 tokens/s
org.apache.lucene.analysis.WhitespaceAnalyzer: 750 ms, 3694218 tokens/s

Sun 1.5.0 Client
org.apache.lucene.analysis.standard.StandardAnalyzer: 23891 ms, 88696 tokens/s
org.apache.lucene.analysis.fast.FastAnalyzer: 3641 ms, 581993 tokens/s
org.apache.lucene.analysis.WhitespaceAnalyzer: 1437 ms, 1928089 tokens/s

Sun 1.6.0 Server
org.apache.lucene.analysis.standard.StandardAnalyzer: 13719 ms, 154460 tokens/s
org.apache.lucene.analysis.fast.FastAnalyzer: 2484 ms, 853074 tokens/s
org.apache.lucene.analysis.WhitespaceAnalyzer: 750 ms, 3694218 tokens/s

Sun 1.6.0 Client
org.apache.lucene.analysis.standard.StandardAnalyzer: 22312 ms, 94972 tokens/s
org.apache.lucene.analysis.fast.FastAnalyzer: 2750 ms, 770558 tokens/s
org.apache.lucene.analysis.WhitespaceAnalyzer: 1297 ms, 2136209 tokens/s

IBM 1.4.2
org.apache.lucene.analysis.standard.StandardAnalyzer: 11922 ms, 177741 tokens/s
org.apache.lucene.analysis.fast.FastAnalyzer: 3218 ms, 658495 tokens/s
org.apache.lucene.analysis.WhitespaceAnalyzer: 1407 ms, 1969199 tokens/s

IBM 1.5.0
org.apache.lucene.analysis.standard.StandardAnalyzer: 11797 ms, 179625 tokens/s
org.apache.lucene.analysis.fast.FastAnalyzer: 2968 ms, 713961 tokens/s
org.apache.lucene.analysis.WhitespaceAnalyzer: 1000 ms, 2770664 tokens/s

BEA 1.4.2
org.apache.lucene.analysis.standard.StandardAnalyzer: 16234 ms, 130530 tokens/s
org.apache.lucene.analysis.fast.FastAnalyzer: 3344 ms, 633683 tokens/s
org.apache.lucene.analysis.WhitespaceAnalyzer: 1343 ms, 2063040 tokens/s

BEA 1.5.0 (looks really slow)
org.apache.lucene.analysis.standard.StandardAnalyzer: 33891 ms, 62525 tokens/s
org.apache.lucene.analysis.fast.FastAnalyzer: 12703 ms, 166813 tokens/s
org.apache.lucene.analysis.WhitespaceAnalyzer: 4860 ms, 570095 tokens/s



> A faster JFlex-based replacement for StandardAnalyzer
> -----------------------------------------------------
>
>                 Key: LUCENE-966
>                 URL: https://issues.apache.org/jira/browse/LUCENE-966
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Stanislaw Osinski
>             Fix For: 2.3
>
>         Attachments: AnalyzerBenchmark.java, jflex-analyzer-patch.txt
>
>
> JFlex (http://www.jflex.de/) can be used to generate a faster (up to several 
> times) replacement for StandardAnalyzer. Will add a patch and a simple 
> benchmark code in a while.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to