Hello all,
I had a look at oro code and it looks like Perl5Matcher construction is not expensive at all. Maybe GC time would be increased due to frequent object construction but I do not think it shoudl create a problem. I didn't wrote any performance tests to check if using ThreadLocal is faster than construction of new matcher in this case, but for me the important thing is it now works correctly.
I have additional question - when we look at RegexUrlNormalizer it has normalize() method synchronized. The only thing inside this method that needs synchronization is exactly the same oro usage problem. Because normalize() is synchronized we do not have exceptions but fetcher threads might be slowed down due to synchronization.
Exactly the same solution is used in RegexUrlFilter.
If someone who knows this code can crosscheck my findings, and thinks it might be useful to change it I can prepare a patch for it (using ThreadLocal to have the same solution in all places).
Regards
Piotr
Doug Cutting wrote:
I just comitted a variation of this patch. Instead of allocating a new Perl5Matcher for each call, I used a ThreadLocal to cache one Perl5Matcher per thread.
Thanks!
Doug
---------------------------------------------------------------------- Startuj z INTERIA.PL!!! >>> http://link.interia.pl/f1837
------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
