> This "// TODO: make sure these returned charsref are immutable?" is a good > point, because now they're very mutable, referring to internal preallocated > buffers in Stemmer which are constantly reused. >
You'd need to copy them or otherwise make sure they remain constant while in the cache, obviously. > In cache-all condition, you ignore the maxSize intentionally, right? > Yes - I was just trying to figure out how many tokens are there in that sample. > I've reproduced your results for English. I also checked German and > French, which have compounds and more advanced inflection. They're improved > as well, but not so much (30-40% on cache=10000, while calling native > Hunspell via JNI is 2-4 times faster). > Sure. Like I said, it was just an idea off the top of my head. D.
