Hi guys,
I know it's me again. I have been testing Nutch robustly lately and here some threads issues that I found. I am running version 0.8.2-dev. When Nutch is initially run (either from script or ANT), it has a default of 10 threads for the fetcher. This is actually good for performance reason as large number of urls can be indexed fast enough. The problem is some plugins are not thread safe (or is it the fetcher that's not thread-safe). I am running the parse-xml plugin (Nutch-185) and some issues: When running multiple threads such as the default "10 threads", I have some inconsistency with the stored fields and values. I found out the first 6 documents will be indexed without problem and then 4 with errors, 4 correct and x numbers with errors and so forth. At first I couldn't see where the problem was, and after several debugging activities, I realize that it could be a threading issue. I run Nutch with the minimum threading of 1 and the fields were stored without any issues. I don't know how to conclude this but I think that the methods that Nutch uses for threading are not thread safe. I could be wrong therefore I am awaiting any reply. Regards, Armel ------------------------------------------------- Armel T. Nene iDNA Solutions Tel: +44 (207) 257 6124 Mobile: +44 (788) 695 0483 <http://blog.idna-solutions.com/> http://blog.idna-solutions.com