[ 
https://issues.apache.org/jira/browse/LUCENE-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12517184
 ] 

Stanislaw Osinski commented on LUCENE-966:
------------------------------------------

Thanks for more test cases. I guess the biggest problem here is that the 
scanner generated by JavaCC doesn't seem to strictly follow the specification 
(see https://issues.apache.org/jira/browse/LUCENE-966#action_12516893), so I'd 
need to emulate possible JavaCC "bugs" I'm not aware of at the moment (I'm not 
an expert on lexical scanner generation either, not yet at least :). I can add 
some workarounds to the grammar to make the known incompatibility examples 
work, but this won't guarantee consistency in general.

As a side note, it's a shame there's no trace of the version of JavaCC that was 
used to generate the scanner for the original StandardAnalyzer. I'm also 
curious if the results of the current JavaCC grammar would be the same with the 
newest version of the generator (4.0 I guess) -- I'll try to check that.

Anyway, I'll take a look at the problem in more depth once again. And in the 
worst case scenario, we can keep the StandardAnalyzer as it was and add the new 
one next to it so that people can have a choice (on the other hand, this might 
be a problem for the quality tests).

> A faster JFlex-based replacement for StandardAnalyzer
> -----------------------------------------------------
>
>                 Key: LUCENE-966
>                 URL: https://issues.apache.org/jira/browse/LUCENE-966
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Stanislaw Osinski
>             Fix For: 2.3
>
>         Attachments: AnalyzerBenchmark.java, jflex-analyzer-patch.txt, 
> jflex-analyzer-r560135-patch.txt, jflex-analyzer-r561292-patch.txt, 
> jflex-analyzer-r561693-compatibility.txt
>
>
> JFlex (http://www.jflex.de/) can be used to generate a faster (up to several 
> times) replacement for StandardAnalyzer. Will add a patch and a simple 
> benchmark code in a while.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to