StandardTokenizer is slowing down highlighting a lot

Michael Stoppelman Wed, 18 Jul 2007 17:29:08 -0700

Hi all,

I was tracking down slowness in the contrib highlighter code and it seems
the seemingly simple tokenStream.next() is the culprit.
I've seen multiple posts about this being a possible cause. Has anyone
looked into how to speed up StandardTokenizer? For my
documents it's taking about 70ms per document that's a big ugh! I was
thinking I might just cache the TermVectors in memory if
that will be faster. Anyone have another approach to solving this problem?

-M

StandardTokenizer is slowing down highlighting a lot

Reply via email to