What is the design contract on plugins when it comes to thread safety?
I was under the assumption that plugins should be thread safe, but I
have been running into concurrent modification exceptions from the
language identifier plugin while indexing. My application is a bit
different from the normal nutch way. I have may crawls going on
concurrently within an application. So, that means I would also have
many concurrent indexing tasks. So, if I can't be guaranteed that
plugins are threadsafe, I may need to do a nasty thing and synchronize
my index() method (ouch).
Here is the exception, just for info:
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextEntry(HashMap.java:787)
at java.util.HashMap$ValueIterator.next(HashMap.java:817)
at
org.apache.nutch.analysis.lang.NGramProfile.normalize(NGramProfile.java:277)
at
org.apache.nutch.analysis.lang.NGramProfile.analyze(NGramProfile.java:244)
at
org.apache.nutch.analysis.lang.LanguageIdentifier.identify(LanguageIdentifier.java:409)
at
org.apache.nutch.analysis.lang.LanguageIndexingFilter.filter(LanguageIndexingFilter.java:84)
at
org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:131)
at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:240)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:313)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:155)
--briggs
"Conscious decisions by conscious minds are what make reality real"
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers