[ https://issues.apache.org/jira/browse/LUCENE-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12517233 ]
Stanislaw Osinski commented on LUCENE-966: ------------------------------------------ To be precise -- I'm not 100% sure that this is a bug in JavaCC (I'll try to browse/ask on their mailing list to find out), but it looks like the scanner generated by JavaCC does not really strictly follow the grammar. I discovered this when you gave the examples of different token produced by JFlex- and JavaCC-based scanners generated from equivalent grammars -- JFlex seemed to behave "correctly", hence different tokens. I was about to start trying to hack the grammar to produce results simiar to StandardAnalyzer (based on a finite set of test cases :). I'll see how it compares performance-wise too. > A faster JFlex-based replacement for StandardAnalyzer > ----------------------------------------------------- > > Key: LUCENE-966 > URL: https://issues.apache.org/jira/browse/LUCENE-966 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Stanislaw Osinski > Fix For: 2.3 > > Attachments: AnalyzerBenchmark.java, jflex-analyzer-patch.txt, > jflex-analyzer-r560135-patch.txt, jflex-analyzer-r561292-patch.txt, > jflex-analyzer-r561693-compatibility.txt > > > JFlex (http://www.jflex.de/) can be used to generate a faster (up to several > times) replacement for StandardAnalyzer. Will add a patch and a simple > benchmark code in a while. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]