Re: StandardTokenizer is slowing down highlighting a lot

Mark Miller Wed, 25 Jul 2007 04:29:49 -0700

I would be very interested. I have been playing around with Antlr to seeif it is any faster than JavaCC, but haven't seen great gains in mysimple tests. I had not considered trying JFlex.

I am sure a faster StandardAnalyzer would be greatly appreciated.StandardAnalyzer appears widely used and horrendously slow. Even betterwould be a StandardAnalyzer that could have different recognizersenabled/disabled. For example, dropping NUM recognition if you don'tneed it in the current StandardAnalyzer gains like 25% speed.


- Mark

Stanislaw Osinski wrote:

Unfortunately, StandardAnalyzer is slow. StandardAnalyzer is really
limited by JavaCC speed. You cannot shave much more performance out of
the grammar as it is already about as simple as it gets.
JavaCC is slow indeed. We used it for a while for Carrot2, but then (3yearsago :) switched to JFlex, which for roughly the same grammar wouldsometimesbe up to 10x (!) faster. You can have a look at our JFlexspecification at:
http://carrot2.svn.sourceforge.net/viewvc/carrot2/trunk/carrot2/components/carrot2-util-tokenizer/src/org/carrot2/util/tokenizer/parser/jflex/JFlexWordBasedParserImpl.jflex?view=markup
This one seems more complex than the StandardAnalyzer's but it's muchfaster
anyway.
If anyone is interested, I could prepare a JFlex based Analyzerequivalent(to the extent possible) to current StandardAnalyzer, which mightoffer nice
indexing and highlighting speed-ups.

Best,

Staszek


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: StandardTokenizer is slowing down highlighting a lot

Reply via email to