[ https://issues.apache.org/jira/browse/LUCENE-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516745 ]
Michael McCandless commented on LUCENE-966: ------------------------------------------- I took the patch from here (to use jflex for StandardAnalyzer) and merged it with the patch from LUCENE-969 (re-use Token & TokenStream) to measure the net performance gains. I measure the time to just tokenize all of Wikipedia using StandardAnalyzer using contrib/benchmark plus patch from LUCENE-967 (test details are described in LUCENE-969). With the jflex patch it takes 646 sec (best of 2 runs); when I then merge in the patch from LUCENE-969 it takes 455 sec. Subtracting off the time to just load all Wikipedia docs (= 112 sec) that gives net additional speedup of 36% (534 sec -> 343 sec) when using LUCENE-969 in addition to jflex. A couple other things I noticed: * The init cost of jflex (StandardTokenizerImpl) seems to be fairly high: when I repeat the above test with smallish docs (100 tokens each) instead, the gain is around 84%. I think this just makes the new reusableTokenStream() in LUCENE-969 important to commit. * I'm seeing differing token counts with the jflex StandardAnalyzer vs the current one; I think there is some difference here. I will track down which tokens differ and post back... > A faster JFlex-based replacement for StandardAnalyzer > ----------------------------------------------------- > > Key: LUCENE-966 > URL: https://issues.apache.org/jira/browse/LUCENE-966 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Stanislaw Osinski > Fix For: 2.3 > > Attachments: AnalyzerBenchmark.java, jflex-analyzer-patch.txt, > jflex-analyzer-r560135-patch.txt, jflex-analyzer-r561292-patch.txt > > > JFlex (http://www.jflex.de/) can be used to generate a faster (up to several > times) replacement for StandardAnalyzer. Will add a patch and a simple > benchmark code in a while. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]