> This "// TODO: make sure these returned charsref are immutable?" is a good
> point, because now they're very mutable, referring to internal preallocated
> buffers in Stemmer which are constantly reused.
>

You'd need to copy them or otherwise make sure they remain constant while
in the cache, obviously.


> In cache-all condition, you ignore the maxSize intentionally, right?
>

Yes - I was just trying to figure out how many tokens are there in that
sample.


> I've reproduced your results for English. I also checked German and
> French, which have compounds and more advanced inflection. They're improved
> as well, but not so much (30-40% on cache=10000, while calling native
> Hunspell via JNI is 2-4 times faster).
>

Sure. Like I said, it was just an idea off the top of my head.

D.

Reply via email to