[ 
https://issues.apache.org/jira/browse/LUCENE-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12517222
 ] 

Michael McCandless commented on LUCENE-966:
-------------------------------------------

If it really is down to emulating the bugs/oddities in JavaCC then I
think it's not worth polluting the new tokenizer with these legacy
bugs, unless one or two cases can match perfectly and not degrade
performance too badly?

And maybe what we should do is make this a new tokenizer, calling it
StandardAnalyzer2, and then deprecate the existing StandardAnalyzer?
Then remove any & all JavaCC bug emulation from the new one.

This way people relying on the precise bugs in JavaCC tokenization are
not hurt on upgrading to 2.3 and are given a chance to migrate to the
new one (with 1 release of deprecated StandardAnalyzer).  And new
people will use the new faster one.


> A faster JFlex-based replacement for StandardAnalyzer
> -----------------------------------------------------
>
>                 Key: LUCENE-966
>                 URL: https://issues.apache.org/jira/browse/LUCENE-966
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Stanislaw Osinski
>             Fix For: 2.3
>
>         Attachments: AnalyzerBenchmark.java, jflex-analyzer-patch.txt, 
> jflex-analyzer-r560135-patch.txt, jflex-analyzer-r561292-patch.txt, 
> jflex-analyzer-r561693-compatibility.txt
>
>
> JFlex (http://www.jflex.de/) can be used to generate a faster (up to several 
> times) replacement for StandardAnalyzer. Will add a patch and a simple 
> benchmark code in a while.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to