Briggs wrote:
What is the design contract on plugins when it comes to thread safety?
I was under the assumption that plugins should be thread safe, but I
have been running into concurrent modification exceptions from the
language identifier plugin while indexing. My application is a bit
They should be thread-safe. E.g. Fetcher runs many threads in parallel,
each thread using plugins to handle fetching, parsing, url filtering,
etc, etc.
different from the normal nutch way. I have may crawls going on
concurrently within an application. So, that means I would also have
many concurrent indexing tasks. So, if I can't be guaranteed that
plugins are threadsafe, I may need to do a nasty thing and synchronize
my index() method (ouch).
Here is the exception, just for info:
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextEntry(HashMap.java:787)
at java.util.HashMap$ValueIterator.next(HashMap.java:817)
at
org.apache.nutch.analysis.lang.NGramProfile.normalize(NGramProfile.java:277)
This is a bug. My guess is that NGramProfile.getSorted() should be
synchronized. Could you please test if this works?
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com