Briggs wrote:
What is the design contract on plugins when it comes to thread safety?
I was under the assumption that plugins should be thread safe, but I
have been running into concurrent modification exceptions from the
language identifier plugin while indexing.  My application is a bit

They should be thread-safe. E.g. Fetcher runs many threads in parallel, each thread using plugins to handle fetching, parsing, url filtering, etc, etc.


different from the normal nutch way.  I have may crawls going on
concurrently within an application.  So, that means I would also have
many concurrent indexing tasks.  So, if I can't be guaranteed that
plugins are threadsafe, I may need to do a nasty thing and synchronize
my index() method (ouch).


Here is the exception, just for info:

java.util.ConcurrentModificationException
       at java.util.HashMap$HashIterator.nextEntry(HashMap.java:787)
       at java.util.HashMap$ValueIterator.next(HashMap.java:817)
at org.apache.nutch.analysis.lang.NGramProfile.normalize(NGramProfile.java:277)

This is a bug. My guess is that NGramProfile.getSorted() should be synchronized. Could you please test if this works?

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to