On Jul 25, 2007, at 7:19 AM, Stanislaw Osinski wrote:
Unfortunately, StandardAnalyzer is slow. StandardAnalyzer is really
limited by JavaCC speed. You cannot shave much more performance
out of
the grammar as it is already about as simple as it gets.
JavaCC is slow indeed. We used it for a while for Carrot2, but then
(3 years
ago :) switched to JFlex, which for roughly the same grammar would
sometimes
be up to 10x (!) faster. You can have a look at our JFlex
specification at:
http://carrot2.svn.sourceforge.net/viewvc/carrot2/trunk/carrot2/
components/carrot2-util-tokenizer/src/org/carrot2/util/tokenizer/
parser/jflex/JFlexWordBasedParserImpl.jflex?view=markup
This one seems more complex than the StandardAnalyzer's but it's
much faster
anyway.
If anyone is interested, I could prepare a JFlex based Analyzer
equivalent
(to the extent possible) to current StandardAnalyzer, which might
offer nice
indexing and highlighting speed-ups.
+1. I think a lot of people would be interested in a faster
StandardAnalyzer.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]