Hi all, I was tracking down slowness in the contrib highlighter code and it seems the seemingly simple tokenStream.next() is the culprit. I've seen multiple posts about this being a possible cause. Has anyone looked into how to speed up StandardTokenizer? For my documents it's taking about 70ms per document that's a big ugh! I was thinking I might just cache the TermVectors in memory if that will be faster. Anyone have another approach to solving this problem?
-M