[
https://issues.apache.org/jira/browse/LUCENE-966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stanislaw Osinski updated LUCENE-966:
-------------------------------------
Attachment: AnalyzerBenchmark.java
Here is a very simple benchmark I used to test the performance of
StandardAnalyzer, FastAnalyzer and WhitespaceAnalyzer. I ran it on a number of
JVMs and got the following results:
Input: Reuters collection, the one used by contrib/benchmark, only documents
longer than 100 bytes
Machine: AMD Sempron 2600+, 2G RAM, Windows XP
Sun 1.4.2 Server
org.apache.lucene.analysis.standard.StandardAnalyzer: 15172 ms, 139667 tokens/s
org.apache.lucene.analysis.fast.FastAnalyzer: 2438 ms, 869170 tokens/s
org.apache.lucene.analysis.WhitespaceAnalyzer: 781 ms, 3547585 tokens/s
Sun 1.4.2 Client
org.apache.lucene.analysis.standard.StandardAnalyzer: 24187 ms, 87610 tokens/s
org.apache.lucene.analysis.fast.FastAnalyzer: 3157 ms, 671218 tokens/s
org.apache.lucene.analysis.WhitespaceAnalyzer: 1453 ms, 1906857 tokens/s
Sun 1.5.0 Server
org.apache.lucene.analysis.standard.StandardAnalyzer: 16062 ms, 131928 tokens/s
org.apache.lucene.analysis.fast.FastAnalyzer: 2641 ms, 802361 tokens/s
org.apache.lucene.analysis.WhitespaceAnalyzer: 750 ms, 3694218 tokens/s
Sun 1.5.0 Client
org.apache.lucene.analysis.standard.StandardAnalyzer: 23891 ms, 88696 tokens/s
org.apache.lucene.analysis.fast.FastAnalyzer: 3641 ms, 581993 tokens/s
org.apache.lucene.analysis.WhitespaceAnalyzer: 1437 ms, 1928089 tokens/s
Sun 1.6.0 Server
org.apache.lucene.analysis.standard.StandardAnalyzer: 13719 ms, 154460 tokens/s
org.apache.lucene.analysis.fast.FastAnalyzer: 2484 ms, 853074 tokens/s
org.apache.lucene.analysis.WhitespaceAnalyzer: 750 ms, 3694218 tokens/s
Sun 1.6.0 Client
org.apache.lucene.analysis.standard.StandardAnalyzer: 22312 ms, 94972 tokens/s
org.apache.lucene.analysis.fast.FastAnalyzer: 2750 ms, 770558 tokens/s
org.apache.lucene.analysis.WhitespaceAnalyzer: 1297 ms, 2136209 tokens/s
IBM 1.4.2
org.apache.lucene.analysis.standard.StandardAnalyzer: 11922 ms, 177741 tokens/s
org.apache.lucene.analysis.fast.FastAnalyzer: 3218 ms, 658495 tokens/s
org.apache.lucene.analysis.WhitespaceAnalyzer: 1407 ms, 1969199 tokens/s
IBM 1.5.0
org.apache.lucene.analysis.standard.StandardAnalyzer: 11797 ms, 179625 tokens/s
org.apache.lucene.analysis.fast.FastAnalyzer: 2968 ms, 713961 tokens/s
org.apache.lucene.analysis.WhitespaceAnalyzer: 1000 ms, 2770664 tokens/s
BEA 1.4.2
org.apache.lucene.analysis.standard.StandardAnalyzer: 16234 ms, 130530 tokens/s
org.apache.lucene.analysis.fast.FastAnalyzer: 3344 ms, 633683 tokens/s
org.apache.lucene.analysis.WhitespaceAnalyzer: 1343 ms, 2063040 tokens/s
BEA 1.5.0 (looks really slow)
org.apache.lucene.analysis.standard.StandardAnalyzer: 33891 ms, 62525 tokens/s
org.apache.lucene.analysis.fast.FastAnalyzer: 12703 ms, 166813 tokens/s
org.apache.lucene.analysis.WhitespaceAnalyzer: 4860 ms, 570095 tokens/s
> A faster JFlex-based replacement for StandardAnalyzer
> -----------------------------------------------------
>
> Key: LUCENE-966
> URL: https://issues.apache.org/jira/browse/LUCENE-966
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Analysis
> Reporter: Stanislaw Osinski
> Fix For: 2.3
>
> Attachments: AnalyzerBenchmark.java, jflex-analyzer-patch.txt
>
>
> JFlex (http://www.jflex.de/) can be used to generate a faster (up to several
> times) replacement for StandardAnalyzer. Will add a patch and a simple
> benchmark code in a while.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]