Otis Gospodnetic wrote:
I wonder about SnowballAnalyzer and SnowballFilterCompared to all of the tokens & strings that will be allocated when it is used, the allocation of the stemmer should not be significant. And the stemmers are not thread safe anyway.
classes.
The ctor of the later uses introspection to instantiate the appropriate
Stemmer.
In most use cases that will be the same Stemmer from call to call. Seems like redundant work and objects created.
Wouldn't it be better to have SnowballFilter 'cache' instances of
previously instantiated Stemmers?
I guess that would require that Snowball's Stemmers are thread
safe....are they?
I don't particularly like the use of introspection either. I copied it from Snowball's sample code. Unfortunately there's no other way to do this without modifying the Snowball code, which I'd rather not do. Currrently this project incorporates the Snowball code as-is, so that if/when the Snowball project updates things it should be very easy to integrate those updates.
This project is still a work in progress. I want to do some benchmarking, more testing and add better documentation before I make a release and announce its availability. If the benchmarking shows major performance problems, then I may have to look at optimizing the Snowball code, but I hope to avoid that.
Doug
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
